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ABSTRACT 


The  explosion  of  digital  technology  provides  the  warrior  with  the  potential  to 
exploit  the  battlespace  in  ways  previously  unknown.  Unfortunately,  this  godsend  is  a 
two-edge  sword.  Although  it  promises  the  military  commander  greater  situational 
awareness,  the  resulting  tidal  wave  of  data  impairs  his  decision-making  capacity.  More 
data  is  not  needed;  enhanced  information  and  knowledge  are  essential. 

This  study  built  upon  the  Mean  Separator  Neural  Network  (MSNN)  signal 
classification  tool  originally  proposed  by  Duzenli  (1998)  arid  modified  it  for  increased 
robustness.  MSNN  variants  were  developed  and  investigated.  One  modification 
involved  input  data  preconditioning  prior  to  neural  network  processing.  A  second 
modification  incorporated  projection  space  variance  in  a  re-defined  performance 
parameter  and  in  a  newly  defined  training  termination  criterion.  These  alternative  MSNN 
architectures  were  measured  against  the  standard  MSNN,  a  single-layer  perceptron,  and  a 
statistical  classifier  using  data  of  varying  input  dimensionality  and  noise  power. 
Classification  simulations  performed  using  these  techniques  measured  the  accuracy  in 
categorizing  data  objects  composed  of  artificial  features  and  features  extracted  from 
synthetic  communication  signals.  The  projection  space  modification  variant  exceeded  all 
classifiers  under  noise-free  conditions  and  performed  comparably  to  the  standard  MSNN 
in  noisy  environments.  The  preconditioned  input  method  produced  a  poorer  response 
under  most  situations. 
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Spirit.  Without  you,  Lord,  nothing  is  possible;  but  with  you,  everything  is  gained. 
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I.  INTRODUCTION 


A.  BACKGROUND 

In  “A  Maritime  Strategy  for  the  Naval  Century,”  Admiral  Jay  L.  Johnson,  Chief 
of  Naval  Operations,  declared,  “Just  as  naval  forces  command  the  operational  domain  of 
the  seas,  we  seek  to  command  cyberspace,  by  harnessing  today’s  technology  to 
revolutionize  naval  operations”  (Johnson,  2000).  The  explosion  of  digital  technology 
has  indeed  paved  the  way  for  the  revolution  in  military  operations  currently  enjoyed. 
Advances  in  undersea  warfare,  the  cooperative  engagement  capability,  space  and 
terrestrial  communications,  and  computer  networks  provide  the  warrior  with  the  potential 
to  exploit  the  battlespace  in  ways  previously  unknown.  Unfortunately,  this  godsend  is  a 
two-edge  sword.  Although  it  gives  the  military  commander  the  promise  of  attaining 
greater  situational  awareness,  the  tidal  wave  of  data  severely  impairs  his  decision-making 
capacity.  Instead  of  assisting,  the  data-rich,  information-poor,  and  knowledge-starved 
warfighter  is  incapacitated  and  confused  by  the  abundance  of  data  that  inundates  him. 
More  data  is  not  needed;  enhanced  information  and  knowledge  are  essential. 

B.  THESIS  OBJECTIVES 

This  proof  of  concept  study  continues  the  development  of  the  Mean  Separator 
Neural  Network  (MSNN)  classification  tool  originally  proposed  by  Duzenli  and  Fargues 
for  identification  of  underwater  signals,  modifying  it  to  increase  performance  robustness 
(Duzenli  and  Fargues,  1998).  As  a  key  component  in  the  warfighter’s  observe-orient- 
decide-act  loop,  decision  tools  like  the  MSNN  signal  classifier  promote  data  evolution  to 
information.  Using  The  MathWork’s  MATLAB  5,  version  5.3,  modification  of  this 
neural  network  are  developed  to  improve  its  classification  capabilities.  The  intent  is  to 
increase  performance  robustness  and  thereby  improve  data  categorization  by  accounting 
for  statistical  parameters  not  considered  in  the  original  MSNN  formulation.  It  is  entirely 
expected  that  incorporation  of  these  additional  attributes  will  increase  computational 
burden;  but  the  effects  of  this  extra  load  are  expected  to  be  unremarkable  and  therefore 
will  not  be  rigorously  monitored.  The  implementation  of  the  proposed  MSNN  schemes 
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will  be  measured  against  two  unrelated  techniques  used  as  benchmarks:  (1)  a  quadratic 
classifier  modeled  purely  on  the  statistical  characteristics  of  the  input  data  and  (2)  a 
single  layer  perceptron  neural  network.  The  accuracy  of  each  classification  method  will 
be  verified  by  its  precision  in  properly  typing  artificial  feature  vectors  and  features 
extracted  from  simulated  signal  modulations.  If  proved  successful,  the  altered  MSNN 
method  offers  a  technique  that  will  assist  the  warfighter  in  attaining  greater  battlespace 
and  infosphere  acuity. 

C.  THESIS  ORGANIZATION 

Following  this  introduction,  Chapter  II  presents  artificial  neural  networks. 
Chapter  HI  delves  into  a  principal  application  of  neural  network:  pattern  recognition  and 
classification.  The  basis  of  the  quadratic  statistical  classifier,  perceptron  neural  network, 
and  MSNN  schemes  are  introduced  and  examined.  In  Chapter  IV,  these  classification 
techniques  are  tested  through  trial  simulations.  Analysis  of  the  results  provides 
rudimentary  insight  into  the  feasibility  of  each  classifier.  Next,  Chapter  V  assesses  the 
classification  techniques  considered  by  categorizing  synthetic  communication  signals. 
Feature  extraction  is  briefly  discussed  to  emphasize  this  aspect  of  signal  classification. 
Chapter  VI  summarizes  the  results  of  this  study  and  recommends  avenues  for  continued 
work. 

Appendix  A  details  an  important  proof  of  perceptron  neural  networks:  the  Fixed- 
Increment  Convergence  Theorem.  Appendix  B  contains  the  empirical  results  of  the 
Chapters  IV  and  V  investigations.  Appendix  C  documents  the  MATLAB  code  written  to 
conduct  the  experimental  portions  of  this  study. 
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II.  NEURAL  NETWORKS 


The  military  commander  needs  advanced  applications  to  complement  the 
advancing  appliances  that  have  become  commonplace  in  today’s  society.  Indeed, 
Moravec  claims  that  by  the  year  2030,  desktop  computers  will  have  the  processing  power 
equal  to  the  human  brain  (Moravec,  1999).  But  such  capabilities  are  useless  unless  they 
simplify  the  mundane  tasks  dealt  with  on  a  routine  basis  and  assist  in  times  of  crisis.  For 
the  warfighter,  this  amounts  to  creating  decision  aids  that  not  only  ingest  data  but  also 
conveys  knowledge.  As  a  stepping  stone  to  attaining  such  knowledge  management 
capabilities,  tools  that  communicate  information  to  the  operator,  and  not  just  delivers 
data,  are  required. 

The  Mean  Separator  Neural  Network  at  the  focus  of  this  thesis  is  designed  to 
impart  information.  Used  as  a  signal  classifier,  this  network  converts  raw  data  to  useful 
information  about  the  target  source.  But  to  understand  how  this  system  operates,  a  basic 
understanding  of  neural  networks  may  prove  useful.  This  chapter  provides  this 
fundamental  insight  into  neural  networks,  starting  with  the  biological  inspiration  for  such 
devices:  the  brain. 

A.  BIOLOGICAL  INSPIRATION 

As  the  name  implies,  neural  networks  are  structured  after  the  workings  of  the 
brain.  The  question  to  ask  then  becomes  why  and  what  advantage  does  this  provide  over 
conventional  computational  devices?  Indeed,  studies  have  shown  that  neurons  in  the 
human  brain  are  much  slower  than  silicon  logic  gates.  The  computers  of  1991,  for 
example,  were  five  to  six  orders  of  magnitude  faster  than  the  brain.  Single  events  that 
take  nanoseconds  in  computers  to  process,  require  milliseconds  in  the  cerebral  cortex. 
Yet,  it  is  common  knowledge  that  the  human  brain  is  more  powerful  than  even  today’s 
computers.  For  instance,  perceptual  recognition  takes  100-200  milliseconds  for  people, 
but  requires  days  for  computers.  In  accomplishing  such  tasks,  the  brain  is  also  much 
more  energy  efficient.  While  computers  consume  10'6  Joules/sec  per  operation,  the 
energy  expenditure  of  the  brain  is  only  10'16  Joules/sec  per  operation.  If  computers 
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process  individual  instructions  more  quickly  than  the  brain,  how  does  the  biological 
neural  network  operate  more  efficiently? 

The  brain  achieves  such  performance  levels  by  utilizing  a  highly  complex,  non¬ 
linear  network  of  parallel  processing  units.  Nearly  a  quadrillion  (1015)  connections  link 
the  one  hundred  billion  processing  elements  (called  neurons)  that  make  up  the  brain. 
Shown  in  Figure  II- 1,  these  neurons  are  composed  of  three  principal  components.  The 
dendrites,  the  axon,  and  the  cell  body.  The  dendrites  and  axons  are  the  communication 
lines  that  convey  electro-chemical  messages  between  adjacent  neurons.  Dendrites  are  the 
receptive  appendages;  axons,  the  transmission  appendages.  The  connections  formed  by 
these  components  are  the  brain’s  synaptic  links.  Between  the  dendrites  and  axon, 
information  is  processed  by  the  cell  body.  The  arrangement  of  the  neurons,  the  strength 
of  the  synaptic  links,  and  the  summing  and  thresholding  of  the  cell  body  determines  the 
processing  power  of  the  biological  neural  network.  (Haykin,  1994,  pp.  1-4),  (Hagan,  et 
al,  1996,  pp.  1-8  -  1-9) 


Figure  II-l.  Biological  Neuron.  From  Ref.  [Hagan,  et  al,  1996,  p.  1-8] 
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B.  COMPUTER  IMITATION 

Because  of  its  massively  parallel  and  complex  structure,  the  brain  operates  more 
efficiently  than  conventional  computers.  It  is  this  capability  that  artificial  neural 
networks  strive  to  replicate.  Like  the  anatomical  prototype,  artificial  neural  networks  use 
experiential  knowledge  to  understand  and  interact  with  the  environment.  That  is, 
artificial  neural  networks  learn.  The  artificial  network  process  input  data  to  approximate 
a  situation  and  stores  this  learned  information  as  “synaptic”  weights.  Hence,  an  artificial 
processing  element  can  be  modeled  after  the  biological  neuron,  as  shown  in  Figure  II-2. 
In  this  diagram,  the  weighted  input  link,  w,  replace  the  dendrites  and  synapses;  a  linear 
summer  and  a  non-linear  activation  function,  (p ,  the  cell  body;  and  the  output  link,  a,  the 
axon.  As  a  result,  the  artificial  neuron  output,  defined  in  Figure  II-2  as 

a  =  cp(wT  •p  +  b),  (2.1) 

illustrates  that  the  non-linear  activation  function,  like  the  cell  body,  determines  the 
neuron’s  characteristic  ability  to  solve  specific  problems. 

Using  this  basic  building  block,  parallel-processing  networks  can  be  constructed. 
Feeding  the  same  input  to  several  neurons  results  in  a  network  layer  of  parallel 


Figure  II-2.  Artificial  Neuron.  After  Ref.  [Hagan,  et  al,  1996,  p.  4-4] 
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processing  elements.  The  data  input  to  these  processing  element  could  be  a  vector  or 
matrix  of  information  originating  from  an  external  sensor  or  an  internal  storage  device. 
But,  when  this  feed  comes  from  an  upstream  neural  layer,  or  alternatively,  when  the  layer 
output  supplies  a  subsequent  downstream  network  layer,  complex  network  structures  are 
assembled.  Thus,  even  though  current  neural  network  architectures  fall  short  of  the 
physiological  capabilities,  artificial  neural  networks  begin  to  resemble  the  human  brain. 

With  this  model  of  an  artificial  neuron,  a  single-layer  Mean  Separator  Neural 
Network  will  be  built  and  examined.  Further  details  on  neural  networks  can  be  obtained 
by  consulting  listed  references  (Dayhoff,  1990),  (Fausett,  1994),  (Hagan,  et  al,  1996), 
(Haykin,  1994). 
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III.  CLASSIFICATION 


Chapter  II  briefly  discussed  neural  network  fundamentals.  In  Chapter  HI,  a 
specific  application  of  this  computational  tool  will  be  considered. 

Adept  at  solving  problems,  neural  networks  are  being  used  in  a  growing  number 
of  diverse  fields.  In  addition  to  applications  in  engineering,  mathematics,  and  the 
physical  sciences,  they  have  proved  useful  in  medicine,  banking  and  finance,  and 
literature.  Table  HI-1  lists  a  few  of  the  fields  impacted  by  neural  network  advancements. 


Industry 

Application 

Aerospace 

Flight  Path  Simulation 

Aircraft  Control 

Component  Simulation  and  Fault  Detectors 

Automotive 

Automatic  Guidance  Systems 

Banking 

Document  Readers 

Credit  Application  Evaluations 

Defense 

New  Sensors 

Target  Tracking  and  Weapon  Steering 
Object  Discrimination 

Electronics 

IC  Chip  Layout  and  Process  Control 
Failure  Analysis 

Code  Sequence  Prediction 

Entertainment 

Animation 

Special  Effects 

Finance  and  Securities 

Market  Analysis  and  Forecasting 

Real  Estate  Appraisal 

Credit  Line  Use  Analysis 

Insurance 

Policy  Application  Evaluation 

Medical 

EEG  and  ECG  Analysis 

Breast  Cancer  Cell  Analysis 

Hospital  Quality  Improvement 

Oil  and  Gas 

Exploration 

Robotics 

Manipulator  Controllers 

Vision  Systems 

Speech 

Speech  Recognition  and  Compression 
Text  to  Speech  Synthesis 

j  T elecommunications 

Image  and  Data  Compression 
Real-Time  Language  Translator 
Automated  Information  Services 

Table  HI-1.  Neural  Network  Applications.  After  Ref.  [Hagan,  et  al,  1996,  pp.  1-5  - 1-6]. 
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Common  among  these  applications  is  a  reliance  on  the  neural  network’s  natural 
ability  to  recognize  patterns.  As  a  result,  neural  networks  are  commonly  tasked  with 
separating  data  into  a  finite  number  of  classes,  i.e.,  classifying.  Classification  is  the  task 
of  categorizing  observation  into  distinct  groups  based  on  characteristics  of  the  class.  For 
example,  when  separating  fruit,  shape,  weight,  size,  color,  texture,  or  smell  could  be  used 
to  differentiate  oranges  from  apples  or  bananas. 

The  attributes  used  to  separate  the  distinct  classes  are  called  features.  These 
features,  arranged  as  vectors,  comprise  the  problem’s  input  or  data  space.  Although  it 
may  seem  that  the  likelihood  of  correct  classification  increases  with  higher  feature  space 
dimensionality,  this  is  not  necessarily  the  case.  For  instance,  consider  a  person  wishing 
to  purchase  an  automobile.  He  may  convey  to  a  dealer  in  meticulous  detail  the 
specifications  he  desired  (e.g.,  exterior  color,  type  of  interior,  engine  horsepower,  gas 
mileage,  trunk  capacity,  wheel  base  length,  audio  components,  etc.)  so  as  to  identify  a 
particular  vehicle.  Imagine  the  dealer’s  exasperation  as  the  customer  goes  through  this 
litany.  The  main  disadvantages  of  the  precision  characterized  in  this  example  are 

1.  irrelevant  and/or  noisy  features  may  be  taken  into  account, 

2.  a  requirement  for  a  large  sample  to  assess  the  robustness  of  the  features  used. 

In  addition,  relying  on  such  a  large  feature  space  increases  the  computational  load  and, 
consequently,  processing  time  of  the  problem.  (Duzenli  and  Fargues,  1998) 

But  alternatively,  consider  the  overzealous  salesman  who  bombards  a  customer 
with  countless  questions  without  receiving  any  satisfactory  answers  in  return.  Often,  the 
particular  pieces  of  information  needed  may  not  be  obtainable.  Solving  the  classification 
dilemma  thereby  becomes  a  problem  of  identifying  an  algorithm  that  will  type 
observations  to  the  correct  class  when  only  a  reduced  feature  space,  either  by  design  or  as 
dictated  by  the  situation,  is  available. 

Feature  determination  and  extraction  are  vital  aspects  of  the  classification 
problem;  however,  the  main  emphasis  of  this  thesis  will  be  algorithm  identification  and 
testing.  As  will  be  seen,  the  method  by  which  neural  networks  classify  is  dependent  on 
the  algorithm  used.  But,  by  no  means  are  neural  networks  the  only  tool  used  to  separate 
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data  into  proper  classes.  In  a  paper  presented  at  the  1999  Military  Communication 
International  Symposium,  Sills  identifies  methods  studied  to  classify  modulated  signals. 
These  efforts  focused  on  frequency-domain  parameters  (Ghani  and  Lamontagne,  1993), 
(Lallo,  1999);  statistical  attributes  of  various  signal  parameters  (Sills,  1999);  and  higher 
order  statistics  of  cyclostationary  signals  (Reichert,  1992).  With  regards  to  neural 
networks,  these  parameters  could  constitute  the  features  of  interest. 

Specifically,  this  thesis  continues  the  development  of  the  Mean  Separator  Neural 
Network  (MSNN)  originally  proposed  by  Duzenli  and  Fargues  for  classifying  underwater 
signals  (Duzenli  and  Fargues,  1998).  To  gauge  its  performance,  the  MSNN  classification 
capability  was  measured  against  a  single-layer  perceptron  neural  network  -  the  least 
complex  neural  network  used  for  classification  -  and  a  classifier  based  solely  on  the 
statistical  characteristics  of  a  particular  class.  This  statistical  classifier  is  considered  next. 

A.  STATISTICAL  CLASSIFIER 

A  statistical  classifier  served  as  one  benchmark  for  the  results  obtained  in  this 
study.  Statistical  classifiers  model  the  problem  space  based  on  data  attributes  (such  as 
mean,  covariance,  or  any  higher  order  moment).  Consequently,  they  may  also  be  known 
as  parametric  classifiers.  Non-parametric  classifiers,  on  the  other  hand,  approximate  the 
problem  based  on  actual  empirical  data.  Neural  networks  are  non-parametric  classifiers. 

For  this  study,  the  statistical  classifier  used  was  the  quadratic  classifier  derived 
from  the  Bayes  likelihood  ratio,  which  has  been  shown  to  minimize  error  probability 
(Fukunaga,  1990,  p.  124).  The  formulation  of  the  decision  rule  governing  the  quadratic 
classification  algorithm  follows. 

Consider  a  space  composed  of  m  classes,  namely  7ti,  7t2, 7E3,  .  .  .  tv  At  some 
time,  an  observation  x  belonging  to  class  Tk  occurs.  The  decision  rule  will  classify  x  to 
,  n*  so  as  to  minimize  error;  that  is  classify  x  to  7t*=Jti.  Setting  the  loss  function  for  this 
situation  as 

=  J“J.  (3-D 
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implies  that  no  loss  arises  when  correct  classification  occurs,  while  unit  loss  results  from 
improper  classification.  From  Equation  3.1,  the  decision  rule  is  given  by 

71  *  (x)  =  7tj  if  P(7tj  I  x)  >  P(7tj  I  x),  Vj,  j  *  i.  (3.2) 

Using  Bayes’  Rule  to  rewrite  the  conditional  a  posteriori  probabilities  in  terms  of  the 
density  function  p(x|7ik)  and  the  a  priori  probabilities  Pk  leads  to 

7t  *  (x)  =  7tj  if  p(x  |  Ttj  )Pj  >  p(x  |  7lj  )Pj  ,  Vj,  j*i.  (3.3) 

For  a  two  class  (i  =1,  j=2)  multi-variant  normal  system,  p(x|7tk)  can  be  expressed  as 

P(X  * %k )  =  \  2tzE  |1/2  eXP^~  ^  (x  "  )T  ^k1  (x  -  Pk  )1  (3.4) 

with  S  the  class  covariance  matrix,  p  the  class  mean  vector,  and  x  the  observation. 
Substituting  Equation  3.4  into  the  inequality  of  Equation  3.3  yields 

*  (x)  =  Jr,  if  —  exp[-  i(x  -  p1)TSr1(x  -  Ht)h  (3.5) 

>  [1/2  exp[~  2  (x  -  P2)T zfr  -  P2)k- 

Since  both  sides  of  the  inequality  are  positive,  taking  the  natural  logarithm  of  each  term 
in  Equation  3.5  results  in 

ln]^1-(x-fi1)T5:i-1(x-p1)  +  21nP1  >ln]-^T-(x-p2)TS-1(x-p2)  +  21nP2.  (3.6) 

Pil  pal 

Alternatively,  Equation  3.6  can  be  expressed  as 

ln|E2 1  +  (x  -  p2 )T  E"1  (x  -  p2 )  -  21nP2  >  ln|St |  +  (x  -  p1  )T  S'1  (x  -  ^ )  -  21nP, .  (3.7) 

When  Equations  3.6  or  3.7  are  true,  observation  x  is  categorized  as  belonging  to  class  iti. 

Considering  the  original  problem  of  m  classes,  the  decision  criteria  is  stated  here 
as  Equation  3.8: 

d;  (x)  =  ln|Ej  |  +  (x  -  Pj  )T  E:1  (x  -  Pi )  -  21nP; .  (3.8) 


10 


Therefore,  using  the  mean  vector  and  covariance  matrix  of  each  class,  m  decision  values 
can  be  calculated  for  the  observation  x.  The  correct  classification  of  x  is  the  class  that 
gives  the  lowest  value  for  d.  (Brunzell  and  Eriksson,  1999) 

Unfortunately,  Equation  3.8  requires  that  the  data  set  be  normally  distributed. 
When  this  is  the  case,  the  quadratic  classifier  performs  remarkably  well. 

B.  PERCEPTRON 

1.  Principles  of  Operation 

Inspired  by  the  assertion  that  “in  spite  of  its  apparent  simplicity,  the  (single  layer 
perceptron)  trained  by  adaptive  optimization  techniques  is  in  fact  a  very  rich  family  of 
linear  classifiers,”  the  second  benchmark  used  to  gauge  the  MSNN  performance  was  a 
perceptron  neural  network  (Raudys,  1996).  Developed  in  the  1950s  by  Frank  Rosenblatt, 
perceptrons  are  designed  to  linearly  separate  adjacent  class  groups  (Figure  ID-1).  Each 
boundary  in  Figure  HI-1  is  determined  using  a  separate  perceptron  component,  shown  in 
Figure  IE-2.  In  this  figure,  the  hard  limit  layer  represents  the  actual  processing  element. 
The  input  block,  comprised  of  R-dimensional  vectors,  p,  corresponds  to  the  training  or 
observation  data.  For  R  greater  than  two,  the  decision  boundary  shown  in  Figure  IH-1 


Figure  III-l.  Linearly  Separable  Classes. 


Figure  III-2.  Single  Perceptron  Processing  Element.  After  Ref.  [Hagan, 
et  al,  1996,  p.  4-4] 


becomes  a  hyperplane.  The  weight  row  vector,  w,  and  bias  scalar,  b,  transform  the  input 
observations  into  a  scalar  output  n,  which  is  then  non-linearly  mapped  by  the  activation 
function,  (p.  The  perceptron  output  therefore  equates  to 

a  =  ^(w.p  +  b).  (3.9) 

The  activation  function  (p  normally  used  for  the  perceptron  is  the  hard  limit,  or  hardlim. 
Figure  IQ-3  illustrates  the  characteristic  of  this  transformation. 

As  shown  in  Figure  EH-3,  the  only  possible  outputs  of  a  single  perceptron  neural 
network  are  0  and  1.  Consequently,  the  neural  network  can  only  separate  two  classes;  the 
decision  boundary,  for  example,  isolating  class  Tti  (network  output  0)  from  class  n2 
(network  output  1). 

This  decision  boundary  is  specified  by  the  hardlim  argument  and  is  represented 
mathematically  by  the  linear  equation 

w.p  +  b  =  0.  (3.10) 
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Figure  III-3.  hardlim  Activation  Function. 


If  the  inner  product  of  the  input  vector  p  and  the  weight  vector  w  is  greater  than  -b,  the 
hardlim  non-linear  transformation  will  map  to  1;  if  the  inner  product  is  less  than  -b, 
hardlim  will  map  to  0.  This  provides  the  distinction  needed  for  classifying  observations. 


Figure  III-4.  Multiple  Perceptron  Neural  Network.  After  Ref.  [Hagan, 
et  al,  1996,  p.  4-4] 
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Since  each  perceptron  can  distinguish  only  two  different  classes,  classification 
problems  involving  more  than  two  choices  require  a  multiple-neuron  architecture,  n 
(rounded  up  to  the  next  integer)  perceptrons  are  needed  to  classify  2|i  different  classes. 
The  three-class  case  shown  in  Figure  m-1,  for  instance,  requires  two  processing 
elements.  Using  matrix-vector  notation.  Figure  EH-2  can  be  modified  to  illustrate  the 
general  case  of  a  multi-perceptron  architecture  and  multiple  trials,  r)  (Figure  HI-4). 

With  jx  processing  elements,  the  decision  rule  for  multi-neuron  networks  must 
consider  a  p,-dimensional  output  vector  of  Is  and  Os.  Each  unique  combination  of  1  and  0 
corresponds  to  a  particular  class.  The  typing  of  an  input  observation  is  determined  by 
matching  the  neuro-classifier  output  to  one  of  these  different  sequences.  Unfortunately, 
when  the  number  of  possible  bit  strings  exceeds  the  number  of  classes,  the  input  data  may 
type  to  a  non-class  sequence.  This  frequently  occurred  during  the  simulations  discussed 
in  Chapter  IV  and  V. 

In  summary,  as  an  observation  is  processed  through  a  trained  perceptron  network, 
the  classifier  output  will  identify  the  appropriate  class  type  for  both  single  and  multiple 
neuron  cases.  Training  the  neuro-classifier  to  determine  the  proper  output  is  discussed 
next. 

2.  Training 

Prior  to  implementing  the  perceptron  neuro-classifier,  the  network  must  be  trained 
to  recognize  different  classes.  This  training  is  accomplished  through  a  supervised 
learning  approach  in  which  sets  of  input  data  and  corresponding  target  output  are 
presented  to  the  neural  network.  The  network  batch  processes  the  input  observations  for 
comparison  of  the  resulting  output  to  the  desired  output.  A  difference  error  between 
these  two  output  values  is  calculated  and  used  to  update  the  perceptron  parameters  -  the 
network’s  weight  vector  and  bias.  Since  the  network  can  only  output  0  or  1,  the  error 
generated  is  limited  to  either  0  or  ±1  (or,  for  multi-perceptron  networks,  a  vector  of  Os 
and  ±ls).  If  the  error  is  zero,  no  weight  or  bias  update  occurs. 

When  the  error  is  non-zero,  the  weight  vector  is  updated  by  adding  a  correcting 
term  (the  product  of  the  error  and  input  data)  to  the  weight  vector.  For  the  bias,  the  error 


14 


is  simply  added  to  the  bias.  Mathematically,  Equation  3.11  and  3.12  compactly  show  this 
perceptron  learning  rule  for  the  general  case  of  multiple  neuron  networks  as 

Wnew  =wold+e.pT  =w°ld  +  (t_a).pT  (3.11) 

bnew  =  b0W  +  e  =  bold  +  (t  _  a)  (3.12) 

These  operations  improve  classification  performance  by  adjusting  the  slope  and 
position  of  the  perceptron  decision  boundary  towards  the  input  data  point.  In  doing  so, 
the  linear  separator  incrementally  rotates  and  translates  to  place  the  input  data  on  the 
correct  side  of  the  decision  boundary. 

3.  Training  Termination 

An  iterative  process,  perceptron  training  involves  cycling  through  the  input/target 
output  pairs  -  each  iteration  through  the  entire  data  set  constituting  an  epoch  -  until 
network  convergence.  Here,  convergence  refers  to  reaching  and  maintaining  a  steady 
state  error  condition.  For  linearly  separable  classes,  perceptron  training  results  in  the  best 
case,  zero-error  solution  within  a  finite  number  of  epochs  (see  Appendix  A). 

Unfortunately,  linearly  separable  problems  are  an  ideal  classification  case. 
Convergence,  in  general,  does  not  imply  a  zero-error  final  state  as  the  nature  of  the 
classification  problem  may  dictate  that  the  steady  state  solution  includes  a  constant  error 
level.  Or,  as  another  possible  outcome,  the  neural  network  may  not  converge  at  all,  but 
instead  oscillate  or  erratically  deviate  about  a  fixed  value.  And  finally,  even  when  the 
network  converges,  there  is  no  guarantee  that  this  constant  state  will  be  attained  within  a 
reasonable  time  period.  For  these  less  than  optimal  cases,  termination  parameters  signal 
when  to  stop  network  training.  Typically  these  parameters  are  satisfied  by  reaching  a 
maximum  number  of  epochs  or  a  maximum  acceptable  performance  level. 

The  simplest  approach  to  end  network  training  would  be  to  reach  a  prescribed 
maximum  number  of  training  cycles.  When  properly  chosen,  this  epoch  limit  can  assure 
attaining  an  adequate  solution.  Unfortunately  when  specified  too  low,  unsatisfactory 
network  output  may  result  since  the  network  would  not  have  had  sufficient  time  to 
achieve  an  acceptable  final  weight  and  bias.  Conversely,  fixing  the  maximum  epoch 
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setpoint  too  high  would  increase  the  likelihood  of  adequate  training  but  at  the  cost  of  an 
excessively  long  training  period. 

But,  determining  the  number  of  epochs  required  to  obtain  an  optimal  solution 
hinges  on  specifying  what  is  meant  by  “optimal”  solution.  To  define  “optimal”  in  this 
sense  requires  having  a  priori  information  of  the  input  data  distribution.  For  a  linearly 
separable  classification  problem,  an  optimal  solution  would  lead  to  zero-error.  For  other 
situations,  a  predetermined  metric  specifying  an  acceptable  error  limit,  such  as  a 
maximum  mean  squared  error  or  sum  of  squared  errors,  could  be  used  to  end  network 
training.  Regardless  of  the  termination  parameter  used,  prior  knowledge  of  the  input  data 
allows  better  approximation  of  the  maximum  epoch  limit.  Combining  this  maximum 
number  of  iterations  with  an  appropriately  set  performance  measure  provides  for 
adequate  control  of  the  training  length. 

4.  Limitations 

Section  in.B  has  dealt  with  using  the  perceptron  neural  network  for  classification 
purposes.  Through  a  simple  learning  rule  (Equations  3.11  and  3.12),  perceptrons  can 
classify  to  zero-error  solutions  in  a  finite  amount  of  time.  Unfortunately,  as  linear 
classifiers,  perceptrons  accomplish  this  only  for  linearly  separable  cases.  As  a  result, 
perceptron  networks  rarely  converge  to  zero-error  solutions,  thus  requiring  the 
implementation  of  termination  parameters  to  limit  network  training. 

This,  however,  is  not  the  principle  disadvantage  of  the  perceptron  network. 
Recalling  that  the  perceptron  uses  the  hardlim  transform,  the  network’s  piecewise 
continuous,  hence  non-differentiable,  activation  function  does  not  allow  application  of 
mathematical  optimization  techniques.  Solving  classification  problems,  therefore, 
becomes  tedious  as  the  iterative  process  amounts  to  “hunting-and-pecking”  for  the  best  fit 
(i.e.,  smallest  error)  solution.  This  trial-and-error  method  limits  perceptron  efficacy. 

Yet  despite  these  inadequacies,  improvements  in  perceptron  efficiency  are 
possible  with  multiple  layer  network  design.  The  next  section,  however,  will  show  that 
by  design  the  MSNN  is  a  single  layer  neural  network.  Because  of  the  focus  on  this 
architecture,  this  investigation  only  considered  single  layer  perceptron  networks. 
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C.  MEAN  SEPARATOR 


As  previously  mentioned,  classification  requires  (1)  the  extraction  and  reduction 
of  features  that  characterize  the  distinct  categories  and  (2)  the  application  of  an  analytical 
tool  that  evaluates  and  separates  observations.  This  thesis,  concerned  principally  with  the 
latter  requirement,  is  focused  on  the  Mean  Separator  Neural  Network  (MSNN)  originally 
presented  by  Duzenli  and  Fargues  (1998).  In  addition,  three  variations  to  this  standard 
mean  separator  algorithm  were  investigated  to  determine  if  enhanced  system  performance 
and  robustness  could  be  achieved. 

1.  Principles  of  Operations 

The  MSNN  differentiates  two  classes  by  evaluating  one-dimensional  projections 
of  each  data  distribution  onto  varying  axes  to  ascertain  which  transformation  direction 
maximizes  the  spread  between  the  class  mean  values;  hence  the  term  “mean  separator.” 
Figure  ID-5  illustrates  this  concept  in  two-dimensional  space  by  showing  two  possible 
mean  separator  projection  axes.  The  ellipses  represent  two  classes  and  the  shading  within 
each  conveys  the  data  distribution;  the  darker  regions  being  more  densely  populated  than 


Figure  III-5.  MSNN  Projection.  Projection  lines  and  data  distribution. 
Due  to  greater  mean  separation,  (a)  is  the  preferred 
projection. 
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the  lighter.  The  orthogonal  axes  correspond  to  two  elements  of  the  feature  space.  The 
slanted  solid  lines  indicate  the  projection  axes  and  the  slanted  dashed  lines  are  the 
projection  of  the  class  means  onto  these  axes. 

Of  the  two  projections  shown  in  Figure  III-5,  case  (a)  with  the  larger  mean 
separation  depicts  the  preferred  selection.  Class  typing  of  future  observations  would  then 
entail  projection  of  the  data  point  onto  this  axis  and  association  to  the  nearest  class  mean. 
As  shown  on  Figure  HI-6,  the  observation  plotted  would  type  to  the  class  TCi. 


Figure  III-6.  MSNN  Class  Typing. 


Multiple  projection  axes  are  needed  to  distinguish  all  pairwise  combinations  when 
considering  more  than  two  categories.  Using  the  MSNN,  Duzenli  investigated  two 
methods  to  identify  observations  as  one  of  more  than  two  classes.  One  algorithm 
determined  all  possible  pairs  of  classes.  For  the  general  case  of  m  classes,  namely  U\,  %2, 
713, . . .  Ttm,  k  possible  combination  exist;  k  determined  by 


k  = 


^  m^ 


V2, 


m! 


m(m-l) 


2!(m-2)!  2 

Each  of  the  k  projections  corresponds  to  a  separate  processing  element  in  the  MSNN. 


(3.13) 
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An  alternate  classification  method  suggested  by  Duzenli  separates  the  data  space 
into  class  i  and  non-class  i  observations.  Segmenting  the  data  as  such  reduces  the 
required  number  of  processing  elements  to  m,  the  class  number.  This  second  alternative 
involves  a  lower  computational  requirement  due  to  the  significantly  fewer  neurons  and, 
therefore,  would  appear  to  be  the  better  choice.  Yet,  prudence  is  cautioned  when  using 
this  latter  alternative  since  assembling  the  data  into  class/non-class  clusters  may  alter 
statistical  parameters  so  as  to  preclude  accurate  data  typing.  Because  of  this,  the  strict 
pairwise  routine  was  followed,  irrespective  of  the  higher  number  of  neurons  needed. 
(Duzenli,  1998) 

The  mechanics  of  MSNN  operations  involves  three  distinct  phases:  training, 
typing,  and  decision-making.  Explaining  these  stages,  however,  requires  understanding 
the  network’s  basic  building  block:  the  MSNN  processing  element,  or  neuron.  This  will 
be  considered  next. 

2.  Processing  Element 

Shown  as  Figure  HI-7,  schematically  the  MSNN  processing  element  differs  little 
from  the  neuron  used  in  perceptron  neural  networks.  Aside  from  the  inclusion  of  a  scalar 


Figure  III-7.  MSNN  Processing  Element 
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multiplier  and  adder  that  serve  to  increase  the  neuron’s  dynamic  range  by  first  amplifying 
and  then  shifting  the  activation  function  output,  the  principle  difference  between  the 
perceptron  and  mean  separator  processing  elements  is  choice  of  activation  function. 
Recall  that  the  perceptron  uses  a  hard  limit  function  that  maps  the  neural  output  to  either 
0  or  1.  Since  this  transform  is  not  analytic,  a  principle  drawback  of  the  perceptron  was 
that  numerical  techniques  could  not  be  used  to  optimize  a  solution. 

In  contrast,  the  MSNN  does  use  a  differentiable  activation  function,  <f>:  the 
logarithmic- sigmoid,  or  logsig,  function.  The  characteristic  and  closed  form  equation  for 
the  logsig  function  (Figure  ID-8)  define  a  smooth  curve  that  gradually  approaches  1  as  its 
argument  increases  to  positive  infinity;  and  0,  as  the  argument  decreases  to  negative 
infinity.  Hence,  differential  optimization  methods  may  be  applied  to  train  and  improve 
neuron  performance.  This  network  training  will  be  addressed  in  more  detail  shortly. 

Figure  HI-7  shows  that  the  MSNN  output  equals 

MSNN  neuron  output  =  20  •  logsig(w .  p  +  b)  - 10.  (3.14) 

As  mentioned  before,  Equation  3.14  incorporates  two  scalar  terms  to  increase  network 
classification  sensitivity.  Arbitrarily  chosen,  the  gain  value  of  20  amplifies  the  logsig 


Figure  III-8.  logsig  Activation  Function. 
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output  while  the  threshold  term  sets  the  MSNN  neuron  output  range  at  -10  to  10. 
Implementing  this  MSNN  neural  output  results  in  a  performance  measure  and  training 
method  that  controls  weight  and  bias  updates. 

3.  Training 

Equation  3.14  defines  the  MSNN  non-linear  transformation.  But  before  this 
equation  can  be  used  for  classification,  network  training  is  required.  This  training 
amounts  to  determining  the  projection  parameters  -  that  is,  the  weight  vector,  w,  and  the 
bias  scalar,  b  -  that  maximizes  class  separation.  For  the  perceptron,  these  parameters 
simply  defined  class  boundary  lines  and  were  found  iteratively  by  cycling  through  input 
data/target  output  pairs  until  a  specific  performance  parameter  was  satisfied.  For  the 
MSNN,  these  weight  and  bias  parameters  identify  the  projection  axis  upon  which 
maximal  mean  separation  occurs.  Consecutive  epochs  also  refine  the  MSNN  parameters, 
but  since  the  logsig  activation  function  is  analytic,  optimization  techniques  can  be  used. 
This  requires  identifying  a  MSNN  performance  function. 

a.  Mean-Difference  Performance  Function 

Duzenli  defined  a  mean-difference  (MDj  projection  index  for  the  MSNN. 
This  thesis  defines  an  analogous  form  (Equation  3.15)  of  his  mean-difference  equation 
as: 


MD  =  -[E{(20  •  0(w  .pi  +  b)  - 10)  -  (20  •  3>(w  .p2  +  b)  - 10)}]2 

=  -[20-E{4>(w.pi  +  b)-4>(w.p2  +  b)}]2,  (3.15) 

with  E  being  the  expectation  operator  and  <3>,  the  logsig  activation  function  (Duzenli, 
1998).  From  this  equation,  the  origin  of  the  term  “mean-difference”  becomes  clear.  The 
equation  maps  observations  belonging  to  two  separate  classes,  denoted  by  the  vectors  pi 
and  p2,  using  the  system’s  performance  parameters  w  and  b.  Applying  the  non-linear 
logsig  function  to  this  linear  transformation  projects  the  pi  and  p2  data  spaces  onto  a  one¬ 
dimensional  projection  axis.  Taking  the  difference  of  the  mean  of  these  projections 
yields  the  mean-difference. 

With  regards  to  Equation  3.15,  squaring  the  mean-difference  emphasizes 
the  magnitude,  and  not  the  sign,  of  the  difference;  while  the  leading  negative  sign  ensures 
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upward  concavity  for  function  minimization.  Recall,  from  Equation  3.14  that  the  purpose 
of  the  scalar  20  was  to  increase  sensitivity  during  class  typing.  Because  of  this  gain, 
Equation  3.15  gives  a  mean-difference  range  of  zero  (when  both  data  distributions  map  to 
0  or  both  map  to  1)  to  -400  (when  one  distribution  maps  to  1  and  the  other  to  0).  The 
former  value  correspond  to  the  worse  case  situation;  the  latter,  to  the  optimal  state. 

The  MSNN  employs  supervised,  batch  processing  of  input  data  to  train  the 
network.  Like  a  perceptron  that  undergoes  explicit  supervised  learning  in  which  specific 
target  outputs  must  be  associated  with  the  input  data,  MSNN  learning  requires  that  the 
training  data  be  assigned  to  the  correct  class.  As  before,  batch  training  refers  to  parallel 
processing  of  the  input  observations,  resulting  in  a  single  update  per  epoch;  vice 
sequential  processing  in  which  the  system’s  weights  and  bias  are  incrementally  changed 
after  each  data  input.  The  MSNN  training  process  is  schematically  shown  on  Figure  IH-9 
for  a  three-class  classification  case. 


Figure  III-9.  3-Class  MSNN:  Training. 


Figure  m-9  incorporates  three  MSNN  processing  elements  into  a  single 
layer  network.  The  training  process  described  above  prepares  the  neuro-classifier  to 
recognize  classes  pi,  p2,  and  P3.  Unlike  the  other  phases  of  MSNN  implementation, 
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during  the  training  stage  each  neuron  simultaneously  processes  two  classes  of  data,  as 
required  by  Equation  3.15.  The  thicker  line  in  the  network  layer  emphasizes  this  parallel 
processing.  For  each  neuron,  these  calculations  yield  MD  values  at  the  input  to  the 
“weight/bias  update”  block.  If  this  value  falls  below  a  threshold  (empirically  determined 
to  be  ninety-percent  of  the  optimal  value,  -360),  the  neuron’s  performance  parameters 
require  no  further  training.  When  the  MD  value  exceeds  -360,  weight  and  bias  updates, 
dw  and  db,  are  determined  using  a  steepest  descent  algorithm. 

b.  Weight  and  Bias  Update  Equations 

When  the  current  projection  index  is  greater  than  -360,  the  MSNN 
parameters  update  according  to  equations  of  the  form 

w[k  + 1]  =  w[k]  +  a[k]  •  fj  [k]  (3.16) 

b[k  + 1]  =  b[k]  +  a[k]  •  f2[k],  (3.17) 

where  a[k]-fi[k]  and  a[k]-f2[k]  adjust  the  weight  and  bias  values  to  improve  MD.  a[k],  a 
variable  learning  rate  parameter,  dictates  the  incremental  step-size  towards  this  upgraded 
projection  index.  The  analytical  meaning  of  fj[k]  and  fz[k]  are  explained  next. 

For  convenience,  Equation  3.16  and  3.17  are  compacted  into  a  single 
vector  equation: 


z[k  + 1]  =  z[k]  +  a[k]  •  f  [k].  (3.18) 

Reiterating  that  Equation  3.15  drives  the  weight  and  bias  update,  a  Taylor’s  first-order 
approximation  of  the  mean-difference  projection  index  about  a  known  weight  vector  and 
bias  yields 

MD(z[k  + 1])  =  MD(z[k]  +  Az[k])  =  MD(z[k])  +  VMD(z[k])  Az[k],  (3.19) 
with  the  second  term  combining  the  gradient  of  the  performance  measure  and  the  change 
in  z.  Seeking  a  trajectory  to  the  optimal  MD  of  -400  and  recognizing  that  this  value  is 
also  the  function’s  lowest  possible  value  requires  that  MD(z[k+l])  <  MD(z[k]).  This 
implies 


VMD(z[k])  Az[k]  <  0. 


(3.20) 
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Using  Equation  3.18  to  define  Az[k]  and  substituting  this  into  Equation  3.20  results  in 

a[k]VMD(z[k])f[k]  <  0,  (3.21) 

with  a[k]  positive  by  convention.  Since  Equation  3.21  is  most  negative  when  f[k]  points 
in  a  direction  opposite  that  of  the  gradient.  Equation  3.18  becomes 

z[k  + 1]  =  z[k]  -  a[k]  •  VMD(z[k]).  (3.22) 

Similarly,  Equations  3.16  and  3.17  become 


w[k  + 1]  =  w[k]  -  a[k] 


3MD[k] 

dw[k] 


b[k  +  l]  =  b[k]-a[k] 


dMD[k] 
3b[k]  ’ 


(3.23) 

(3.24) 


where  the  appropriate  partial  derivative  replaces  the  gradient  term.  With  respect  to  the 
weight  vector  and  bias,  the  partial  derivatives  of  Equation  3.15  are  determined  to  be 


dMD 


=  -800[E{0(w.pi  +  b)  -0(w.p2  +  b)}] 


(3.25) 


3MD 


*  [E{0’(w .  pi  +  b)pi  -  <£’(w  ,p2  +  b)p2}] 
-800[E{<I>(w .  pi  +  b)  -  <£(w .  p2  +  b) }] 


*[E{$>’(w .  pi  +  b)  -  <6’(w .  p2  +  b) }], 
with  O,  the  logsig  activation  function,  and  its  derivative  shown  below: 


(3.26) 


O  =  logsig(n)  =  - - — 

1  +  exp(-n) 


$  =  logsig’ (n)  = - — - — j. 

exp(n)  (1  +  exp(-n)) 


Equations  3.23  and  3.24  comprise  the  MSNN  learning  rule.  The  update  terms  in  these 
equations  correspond  to  the  dw  and  db  terms  shown  in  Figure  IE-9  that  feed  back  through 
the  neural  network.  (Hagan,  et  al,  1996,  pp.  9-2  -  9-3) 


As  an  added  feature  to  improve  network  training,  the  MSNN  step-size,  or 
learning  rate,  also  updates  after  each  iteration.  Patterned  after  the  variable  learning  rate 
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rules  for  backpropagation  neural  networks,  the  MSNN  variable  learning  rate  rules  are 
summarized  below  (Hagan,  et  al,  1996,  pp.  12-12): 

1.  If  after  one  epoch  the  mean-difference  parameter  increases  by  more  than  four- 
percent  (empirically  determined),  then  the  trajectory  is  diverging  from  the 
desired  state.  Consequently,  the  new  weight  and  bias  updates  are  discarded 
and  the  learning  rate  is  halved  to  minimize  movement  away  from  the  optimal 
MD  value. 

2.  If  after  one  epoch  the  mean-difference  parameter  increases  by  less  than  four- 
percent,  then  the  trajectory  is  still  diverging  from  the  desired  MD  value.  This 
movement,  however,  is  tolerable  since  the  change  in  MD  from  the  previous 
value  is  small.  For  this  case,  the  learning  rate  is  unchanged  and  the  new 
weight  and  bias  updates  are  accepted. 

3.  If  after  one  epoch  the  mean-difference  parameter  decreases,  then  the  trajectory 
is  approaching  the  optimal  value.  The  new  weight  and  bias  updates  are 
accepted  and  the  learning  rate  is  doubled  to  increase  movement  in  this 
direction. 

By  doing  this,  the  weight  and  bias  update,  trajectory  are  controlled  as  needed  to  quickly 
approach  optimal  projection  index  values  or  to  minimize  divergence  from  an  acceptable 
solution. 


c.  Training  Termination 

This  training  scheme  updates  the  MSNN  weight  vectors  and  bias  values 
until  termination  conditions  are  satisfied;  either,  the  updated  MD  value  is  less  than  the 
empirically  established  ninety-percent  of  optimal  (<  -360)  or  a  maximum  epoch  limit  is 
reached.  With  the  network  now  trained,  MSNN  classification  next  involves 
parameterizing  each  class  to  establish  the  decision  rule  for  separating  observations.  But, 
before  discussing  these  subsequent  stages,  one  final  point  regarding  network  training 
must  be  emphasized.  From  Figure  HI-8  (plot  of  the  logsig  activation  function)  we  recall 
that  the  MSNN  activation  function  output  asymptotically  approaches  0  or  1.  The  desired 
solution  for  a  classification  problem  occurs  when  one  class  maps  to  0  and  the  other  to  1, 
as  dictated  by  the  argument  of  the  logsig  function.  Unfortunately,  when  the  initial  weight 
and  bias  values,  instead  of  the  class  observation,  dominate  the  output  of  the  linear 
transform  used  as  the  logsig  argument,  the  network  can  become  saturated  after  very  little 
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training.  In  this  saturated  state,  no  further  training  will  occur  since  the  gradient  value  in 
these  regions  is  zero.  In  short,  the  network  has  stalled  and  training  will  terminate  based 
on  the  low  learning  rate  (threshold  set  at  10'4).  To  prevent  this,  the  network  weights  and 
bias  are  initialized  to  low  magnitude  values  and  the  input  features  are  normalized. 
Hence,  network  training  begins  in  the  sloped  region  of  the  logsig  output  to  take 
advantage  of  this  dynamic  region  and  improve  the  likelihood  of  satisfactory  training. 

If  training  terminates  on  low  learning  rate  or  high  epoch  cycles  and  not  on 
acceptable  MD,  the  network  is  retrained  after  first  discarding  and  re-initializing  the 
weights  and  biases.  If  training  ends  due  to  a  satisfactory  MD  level  having  been  reached, 
the  weight  and  bias  values  are  stored.  The  MSNN  is  now  ready  to  proceed  to  the  next 
phase  of  determining  specific  class  identifiers. 

4.  Class  Typing  and  Decision-Making 

Tuned  to  distinguish  the  different  classes,  the  MSNN  must  next  determine  a 
distinct  identifier  for  each  class.  Considering  a  three-class  classification  problem  as 
before.  Figure  HI-IO  diagrams  how  this  is  accomplished. 

Recall,  Figure  HI-9  showed  that  the  neuron  at  the  top  of  the  diagram  (neuron  1) 
had  been  trained  to  separate  classes  pi  and  P2.  The  training  data  for  these  two  classes 


Figure  III-10.  3-CIass  MSNN:  Typing. 
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will  again  be  processed  by  this  neuron.  If  trained  optimally,  the  processing  element  will 
map  one  class  of  data  to  10  and  the  other  to  -10.  At  the  very  least,  it  is  hoped  the  neuron 
maps  one  class  to  a  positive  value  and  the  other  to  a  negative  number.  But,  should  both 
classes  map  to  the  same  value  after  unsatisfactory  neuron  training,  this  unfavorable  event 
is  not  insurmountable.  Since  the  data  point  mappings  from  all  neurons  comprise  the  class 
identifier,  even  if  one  processing  element  is  poorly  trained,  the  other  neurons  may 
potentially  provide  for  unique  class  identifiers. 

For  now,  however,  assume  a  pi  data  point  generates  10,  while  a  P2  observation 
turns  out  -10.  A  class  p3  data  point  will  also  be  cycled  through  neuron  1,  resulting  in 
another  -10,  for  instance.  Consequently,  after  taking  one  observation  from  each  class  and 
mapping  them  by  neuron  1,  the  following  distinction  shown  as  Table  IH-2  is  realized: 


Class  pi 

Class  p2 

Class  p3 

10 

-10 

-10 

Table  III-2.  Hypothetical  Class  pi,  p2,  and  p3  Output 

from  Trained  Neuron  1  (Class  pi  vs  Class  p2). 

In  Table  HI-2,  the  second  column  indicates  the  two  classes  used  to  train  the  neuron. 

Using  the  same  three  training  data  points,  output  from  the  remaining  two  neurons 
are  also  determined.  Completing  Table  HI-2  with  these  remaining  data  points  shows  the 
unique  identity  of  each  class  type. 


Class  pt 

Class  p2 

Class  p3 

Neuron 1 

1,2 

10 

-10 

-10 

Neuron 2 

1,3 

10 

-10 

-10 

10 

-10 

10 

Table  III-2a.  Hypothetical  Class  pi,  p2,  and  p3  Output 


from  Trained  3-Class  Neural  Network. 
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Notice  that  if  neuron  1  had  mapped  the  data  points  from  all  classes  to  10,  for  this  example 
the  three  classes  would  still  have  unique  identifiers.  In  general,  however,  this  is  not  true. 
Neurons  2  and  3  could  have  been  trained  such  the  resulting  specifiers  did  not  uniquely 
identify  each  class  type. 

When  determining  class  specifiers,  the  network  does  not  process  only  one  point 
from  each  class  through  the  neurons.  To  obtain  a  representative  template  for  each  class, 
the  trained  neural  network  processes  all  training  data.  This  produces  a  neuron  map  of  all 
data  points  as  shown  as  Figure  III-ll.  Calculating  the  average  output  from  each  neuron 
for  each  class  determines  the  three  class  specific  identifiers.  These  identifiers,  ri,  r2,  and 
r3  in  the  three-class  case  are  then  saved  for  later  use  in  classifying  observations. 

Up  to  this  point  the  MSNN  has  processed  only  training  data.  Once  the  network 
has  learned  the  characteristics  of  the  input  data  and  can  distinguish  the  separate  classes,  it 
can  be  used  to  classify  new  observations.  Shown  schematically  on  Figure  HI- 12,  this 


Figure  III-ll.  Neuron  Maps  for  Hypothetical  3-Class  MSNN  Typing.  Each 
plot  depicts  how  a  trained  neuron  maps  class  data.  Read 
vertically,  the  plots  identifies  the  unique  class  type  specifiers 
produced  by  the  MSNN. 
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Figure  III-12.  3-Class  MSNN:  Decision-Making. 


process  comprises  the  final  stage  in  classifying  observations  with  the  MSNN:  decision¬ 
making. 

The  decision  phase  begins  when  a  sensor  or  data  storage  device  provides  the 
tuned  MSNN  with  an  observation.  Needless  to  say,  if  the  training  data  was  conditioned 
prior  to  being  processed  by  the  MSNN,  so  must  this  new  observation.  According  to 
Equation  3.14,  the  MSNN  maps  this  observation  producing  an  output  from  each  neuron. 
This  observation  typing,  o,  is  compared  to  the  stored  class  specifiers,  r,-,  via  an  Euclidean 
distance  measurement  of  the  general  form 

d<  =  (r;  -  o;)T  .  (r.  -  Oi)  for  i  =  1,2, ...,  m  (3.27) 

with  the  index  i  indicating  a  particular  class.  The  minimum  distance  measure  associates 
the  observation  to  a  particular  class. 

5.  Summary 

Summarizing  the  main  MSNN  principles,  this  section  has  shown: 

1.  The  MSNN  projects  observations  onto  the  one-dimensional  axis  that 
maximizes  separation  between  the  mean  value  of  two  class  clusters. 
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2.  The  MSNN  processing  elements  utilize  a  differentiable  activation  function 
( logsig )  that  saturates  at  1  and  0  for  input  arguments  of  positive  infinity  and 
negative  infinity,  respectively.  Optimal  performance  requires  initialization  of 
the  network  weight  and  bias  to  low  values  to  prevent  early  network  saturation 
at  these  asymptotic  values. 

3.  The  MD  optimal  value  of  -400  is  attained  when  one  class  maps  to  10  and  the 
other  to  -10.  The  worse  case  MD  value  of  0  occurs  when  the  two  classes  type 
to  the  same  output  value  (both  classes  mapping  to  either  10  or  -10). 

4.  The  MSNN  training  follows  a  steepest  descent  algorithm  that  incorporates  a 
variable  learning  rate  and  terminates  when  ninety-percent  of  the  optimal  MD 
value  is  reached.  Short  of  attaining  this,  MSNN  training  will  cease  when  the 
learning  rate  falls  below  a  set  lower  limit  or  when  a  maximum  number  of 
training  epochs  is  achieved.  If  either  of  these  latter  cases  were  to  occur,  the 
weights  and  bias  would  be  discarded  and  re-initialized  for  re-training. 

5.  Once  trained,  the  MSNN  processes  the  training  data  to  determine  specific 
class  identifiers. 

6.  When  available,  a  new  observation  is  processed  through  the  trained  MSNN. 
The  projection  of  this  observation  by  the  neural  network  is  compared  to  the 
class  identifiers.  Using  an  Euclidean  distance  measure,  the  observation  is 
associated  with  a  class. 

Previous  trials  have  demonstrated  the  classification  capabilities  of  the  MSNN 
(Duzenli,  1998).  As  indicated  above,  this  was  accomplished  by  training  the  neural 
network  to  maximize  the  separation  between  the  projected  means  of  two  class  clusters. 
Relying  on  maximal  mean  separation,  however,  may  not  adequately  ensure  minimal 
cluster  overlap  and,  hence  satisfactory  classification  performance.  The  next  section 
expounds  on  the  reasons  for  this  behavior  and  suggests  modification  to  the  mean 
separator  classification  scheme. 

D.  ALTERNATE  MEAN  SEPARATOR  SCHEMES 

Repeated  here,  Figure  IE-5  illustrates  the  principle  purpose  of  the  MSNN.  As 
previously  explained,  the  original  MSNN  algorithm  favors  case  (a)  because  of  the  larger 
spread  between  projected  cluster  means.  Yet,  examination  of  this  choice  demonstrates  an 
incongruity  of  the  standard  MSNN  process.  Although  case  (a)  does  display  greater  mean 
separation,  more  cluster  overlap  also  occurs  with  this  selection  of  projection  direction. 
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Consequently,  an  observation  belonging  to  class  712  may  type  to  class  rti,  an  inaccurate 
selection,  because  of  its  position  relative  to  the  data  cluster.  For  this  reason,  case  (b) 
would  be  more  appropriate.  Figure  HI- 13  illustrates  this  situation. 


Figure  III-5  (repeated). 


MSNN  Projection. 


Figure  III-13.  Anomalous  MSNN  Classification  Situation. 
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Ironically,  the  effect  of  such  a  situation  would  be  more  profound  when  there  are 
fewer  class  choices.  Recall  that  the  number  of  class  alternatives  determines  the  network 
size.  Fewer  possibilities  result  in  a  network  consisting  of  a  diminished  number  of 
processing  elements.  This  would  be  disadvantageous  since  the  effect  of  the  irregularity 
shown  in  Figure  HI- 13  could  not  be  offset  by  the  increased  network  flexibility  provided 
by  other  neural  mappings.  Fortunately,  the  typical  classification  situation  would  entail 
more  than  a  few  possible  choices,  so  the  likelihood  of  this  scenario  would  be  minimal. 
Moreover,  techniques  that  compensate  for  data  variance  can  prevent  erroneous 
classification  such  as  this.  Three  such  methods  are  explained  here.  The  first  adjusts  the 
MSNN  classification  scheme  by  pre-processing  the  input  data.  The  second  alteration 
normalizes  the  class  spread  by  considering  projected  data  variance.  Finally,  the  third 
applies  a  termination  parameter  defined  for  the  second  modification  method  to  the 
standard  MSNN. 

1.  Input  Data  Preconditioning 

The  first  attempt  to  counter  overlapping  projections  of  two  different  classes 
involves  normalizing  the  input  data  distribution.  It  was  conjectured  that  a  tighter  data 
spread  would  effect  smaller  group  projections,  thereby  facilitating  class  separation. 
Figure  HI- 14  demonstrates  this  hypothesis. 


Figure  III-14.  Postulated  Effect  of  Data  Preconditioning. 
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With  this  data  pre-processing  approach,  changes  to  MSNN  training  and  typing 
algorithms  are  not  needed.  However,  in  addition  to  the  required  preconditioning  of 
training  data  and  observations,  a  more  sophisticated  decision-making  scheme  would  be 
implemented. 

Prior  to  submitting  training  data  to  the  MSNN,  the  training  data  is  normalized 
according  to 


+IY 


(3.28) 


i 

with  p,  and  p,*  respectively  being  the  data  values  before  and  after  normalization;  p.; 
representing  a  vector  of  class  feature  mean  values;  and  <r,  representing  a  vector  of  class 
feature  standard  deviation  values.  We  recognize  that  this  normalization  preserves  the 
mean  values  by  removing  the  feature  averages  and  then  reapplying  them  after  scaling. 
With  n  training  data  points  and  m  classes,  training  data  normalization  would  increase  the 
number  of  floating  point  operations  by  a  factor  of  n*m. 

Having  been  trained  with  normalized  data,  for  the  MSNN  to  accurately  classify 
uncategorized  data  the  observations  must  be  similarly  adjusted.  Therefore,  Equation  3.28 
is  also  applied  to  unclassified  observations  prior  to  processing  by  the  MSNN.  But  while 
the  training  data  can  be  associated  to  a  particular  class,  the  nature  of  the  classification 
problem  dictates  that  the  class  of  the  observation  is  obviously  unknown.  Preconditioning 
of  observations  consequently  calls  for  data  normalization  by  the  statistical  parameters  of 
all  possible  classes.  Accordingly,  the  computational  requirement  has  been  increased  by  a 
factor  of  m,  the  number  of  classes. 

Using  the  adjusted  training  data,  the  MSNN’s  performance  parameters  and  class 
identifiers  are  determined,  as  described  previously  by  Figures  HI-9  and  HI- 10.  All 
equations  used  during  the  MSNN  training  and  typing  phase  apply.  The  trained  network 
then  transforms  the  normalized  observations  into  the  decision  space,  where  the  network 
compares  each  mapped  outcome  to  the  identifier  of  the  particular  class  associated  with 
that  scaled  version.  That  is,  the  output  resulting  from  an  observation  scaled  by  class  i 
statistics  would  be  compared  to  the  class  i  type  identifier.  In  the  end,  the  class  identifier 
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most  similar  to  its  corresponding  network  output  as  determined  by  Euclidean  distance  is 
chosen  as  the  proper  category  of  the  observation.  Compared  to  that  of  the  standard 
MSNN  classifier,  each  mapping  and  matching  routine  entails  no  additional  computations. 
True,  each  observation  would  undergo  m  such  processes,  one  for  each  observation 
scaling;  but,  this  factor  has  already  been  justified.  Overall  then,  an  input  preconditioning 
approach  increases  the  number  of  computer  operations  by  a  factor  of  (n+l)*m.  For  large 
training  sets  and  many  distinct  classes,  the  added  computational  load  is  not  trivial. 

Yet  despite  this  drawback,  the  disadvantage  caused  by  a  large  computational 
requirement  could  be  overlooked  if  actual  trials  demonstrate  a  considerable  improvement 
in  network  performance.  Unfortunately,  enhanced  robustness  may  not  be  demonstrated 
when  input  standard  deviations  are  less  than  one.  Under  these  conditions,  normalization 
would  make  the  training  data  distributions  more  diffuse  and  not  compact.  In  addition, 
since  the  normalization  is  performed  in  the  feature  space,  the  effect  of  input  data 
preconditioning  may  not  affect  the  decision  space  as  positively  as  Figure  TTT-14  shows. 
The  mapping  of  the  normalized  data  points  may  cause  the  projection  distributions  to  be 
tighter,  more  spread  out,  or  unchanged  depending  on  the  neural  networks  initializations 
and  training  trajectory.  For  these  reasons,  decision  space  normalization  is  considered  as  a 
second  method  to  enhance  MSNN  performance. 

2.  Projection  Space  Normalization 
a.  Concept 

By  reducing  the  feature  space  noise  level,  the  first  modification  to  the 
MSNN  classification  scheme  sought  to  improve  network  performance  with  only  minimal 
changes  to  the  standard  algorithm.  Believing  input  data  normalization  would  result  in  a 
less  ambiguous,  more  tightly  clustered  class  distribution,  it  was  thought  projection  into 
the  MSNN  decision  space  would  not  disrupt  this  cohesion.  Consequently,  the  resulting 
compact  clusters  would  enhance  class  separation. 

Upon  reconsideration,  however,  it  was  recognized  that  (1)  normalization 
may  not  reduce  the  variance  of  the  data  distribution  (e.g.,  in  case  in  which  the  feature 
standard  deviation  was  already  less  than  one)  and  (2)  since  the  MSNN  transformation  is 
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non-linear,  projection  into  the  decision  space  could  detrimentally  alter  the  data 
distribution  within  a  cluster. 

So,  instead  of  trying  to  obtain  an  optimal  output  by  pre-processing  the 
input  features,  a  second  variation  of  the  MSNN  would  instead  optimize  the  output 
obtained.  By  minimizing  the  variance  of  the  projected  data  while  still  maximizing  mean 
separation,  projection  cluster  overlap  would  be  reduced,  thereby  lowering  the  likelihood 
of  inaccurate  classification.  As  a  result  of  this  combination  of  actions,  a  large  variance 
may  be  tolerable  if  mean  separation  is  likewise  large;  while  a  smaller  spread  could  be 
unacceptable  for  closely  spaced  class  groupings.  Figure  III-15  illustrates  this  notion. 


Figure  III-15.  Relative  Significance  of  Mean  Separation  to  Variance. 


Shown  in  the  decision  space,  Figure  HI- 15  illustrates  four  combinations  of 
mean  separation  and  variance  and  the  resulting  effect  on  classification  capabilities.  For 
instance,  plots  (a)  and  (b)  illustrate  the  obvious  conditions  with  respect  to  distribution 
variance.  For  a  given  mean  separation,  overlap  is  unlikely  with  low  data  spread  (plot 
(a));  while  the  converse  is  true  with  large  variance  (plot  (b)).  Figures  DI-15  (c)  and  (d), 
however,  emphasize  that  it  is  the  relative,  and  not  absolute,  magnitudes  of  mean 
separation  and  variance  that  are  significant.  In  plot  (c),  large  overlap  occurs  despite  low 
variance;  but  in  plot  IH-15(d),  no  overlap  results  regardless  of  a  large  variance. 
Therefore,  the  approach  does  appear  to  be  more  logical  than  either  of  the  two  earlier 
MSNN  models. 

Executing  this  process,  however,  will  involve  changes  to  the  MSNN 
procedure.  The  MSNN  class  typing  and  decision-making  phases  depicted  in  Figures  TTT- 
10  and  m-12  are  still  applicable  and  will  not  require  change;  but  aspects  of  the  training 
phase  will  need  revision.  Alterations  to  the  training  performance  measure  and  the 
training  termination  criteria  are  considered. 

b.  Modified  Mean-Difference  Projection  Index 
MSNN  training  with  projection  space  normalization  does  not  require 
modification  to  the  network  training  procedure.  The  processing  element  and  the  data 
flow  path  as  depicted  earlier  in  Figures  HI-7  and  HI-9  remain  unchanged.  The 
performance  measure  specified  by  Equation  3.15,  however,  will  be  modified.  Taking 
into  consideration  the  projection  space  variance  of  the  two  transformed  data  distributions, 
the  new  mean-difference  projection  index  (MD2)  is  defined  as 

_  [E{(20-<l)(w.p1  +b)-10)-(20-d>(w «p2  +  b)-10)}]2 
var(20-<|)(w.p1  +b)-10)  + var(20-d>(w.p2  +b)-10) 

=  [E{<j>(w«p,  +b)-d>(w.p2  +  b)}]2 

var(<l>(w.p1  +  b))  +  var(^>(w.p2+b))’  ’ 

where  d>  again  represents  the  logsig  activation  function  and  var  symbolizes  the  statistical 
variance. 
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Because  of  this  new  projection  index,  the  gradient  portion  of  the  mean- 
difference  learning  rate  must  be  recomputed.  Taking  the  partial  derivatives  of  MD2,  as 
specified  by  Equations  3.23  and  3.24,  yields 


^  =  2K[K(E(a^+p|t(-E{a).E{^)-E(P).E{£),  -*£-£>]  (3.30) 
^  =  2K[K(E(«|+P|)-E(a).B{|,-E{p,.B(|»  -E(|-|,].  (3.31) 


with  the  parameters  K,  a,  and  |3  defined  as 


„ _ E(q-P) 

E{a2  +p2)-E2{o}-E2{P) 

3  3a 

a  =  <D(w  .pi  +  b),  — —  =  0'(w .pi  +  b)  pi, -7-  =  <h'(w *pi  +  b) 
dw  db 

R  =  0(w .  p2  +  b),  =  O'  (w .  p2  +  b)  •  p2,  =  O'  (w .  p2  +  b). 

dw  db 

As  before,  the  logsig  activation  function,  O,  and  its  derivative  are  defined  by 


<E>  ■  logsig(n)  =  - 1  --  =  logsig ’(n)  = -  7777 - 7— To- 

1  +  exp(-n)  exp(n)  ( 1  +  exp(-n)) 

Note  that  MD2,  a,  |3  and  their  derivatives  with  respect  to  the  neural  network  bias  are  all 

scalar  quantities.  The  derivatives  of  these  parameters  with  respect  to  the  weight  vector 

are,  on  the  other  hand,  vectors.  This  agrees  with  the  MSNN  learning  rule  equations. 

Equations  3.23  and  3.24. 

With  the  projection  index  now  expressed  as  a  ratio  of  mean  separation  to 
sum  of  projection  variance,  the  range  is  no  longer  constrained  to  [-400,0],  In  fact,  in  the 
optimal  situation,  the  sum  of  variance  is  zero  and  therefore  MD2  is  undefined. 
Conceptually,  a  small  variance  and  the  resulting  large  magnitude  for  MD2  concurs  with 
the  best  case  situation  described  by  the  numerator  of  the  projection  index,  that  of  a  large 
mean  difference.  But,  an  infinitesimally  small  denominator  causes  computational 
difficulties.  To  preclude  this,  the  denominator  of  MD2  and  its  derivatives  are  limited  to  a 
minimum  value  of  10'10. 
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c.  Modified  Termination  Requirement 

The  training  phase  of  the  standard  MSNN  terminated  either  on  maximum 
epoch  limit,  minimum  learning  rate,  or  optimal  performance  measure.  The  first  two 
criteria  are  still  valid  within  the  framework  of  the  projection  space  variance  modification; 
however,  the  latter  case  no  longer  has  any  meaning.  In  the  best  case  scenario,  the 
performance  measure  is  unbounded  and  thus  cannot  be  used  to  end  training.  Multiplying 
the  MD2  projection  index  by  its  denominator  (i.e.,  the  sum  of  projection  variances)  may 
allow  for  implementation  of  a  termination  criteria;  but,  this  termination  requirement 
would  amount  to  only  the  projection  space  mean  separation,  thereby  ignoring  the 
relevance  of  data  spread.  Because  of  this,  a  new  termination  index  that  measured  the 
ratio  of  data  variance  to  mean  separation  was  defined. 

Consider  the  projection  space  data  distributions  shown  on  Figure  TTT-16. 
Improving  classification  performance  relies  on  maximizing  the  separation,  AY,  between 
the  points  Xi  and  x2  relative  to  the  mean  separation,  AM.  Based  on  an  error  tolerance, 
these  points  are  found  using  statistical  error  function  tables,  assuming  both  projected  data 
sets  are  normally  distributed.  The  termination  parameter,  the  variance-mean  ratio 
(VMR),  is  then  defined  as  AV/AM.  For  a  given  mean  separation,  imposing  a  threshold  on 
this  ratio  specifies  the  minimum  spread  value  AV  and  consequently  the  allowed  variance 
of  the  projected  class  distributions.  A  more  rigorous  derivation  of  this  parameter  follows. 
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The  primary  assumption  needed  for  the  derivation  of  the  VMR  criterion  is 
that,  in  the  decision  domain,  the  projected  data  is  normally  distributed.  By  making  this 
claim,  error  function  tables  and  known  characteristics  of  normal  distributions  can  be  used 
to  analytically  derive  VMR.  But,  to  verify  this  supposition  requires  examining  the 
attributes  of  the  projected  data.  Figures  1H-17  through  III-20  illustrate  the  transformed 
data  distributions  for  each  class  of  a  two-class  classification  problem.  Plots  (a)  and  (b) 
display  the  normality  plots  of  the  resulting  distributions.  A  non-vertical,  linear  plot  of  V 
marks  superimposed  on  the  dashed  line  denotes  a  Gaussian  distributed  data  set.  In 
contrast,  a  curvature  in  the  plotting  of  these  marks  indicates  a  departure  from  normality. 
Plots  (c)  and  (d)  are  the  corresponding  histograms. 

In  the  optimal  case  (Figure  HI- 17),  the  data  is  far  from  Gaussian.  This, 
however,  is  desired.  Instead  of  the  expected  bell-shaped  data  distribution  characteristic 
of  a  Gaussian  curve,  the  data  shown  in  Figure  HI- 17  shows  one  vertical  bar.  Recall  that 
when  optimally  trained,  the  MSNN  processing  element  will  precisely  map  one  class  to  10 
and  the  other  -10,  as  shown.  As  will  be  defined  shortly,  VMR  for  this  case  is  1  and  the 
assumption  of  normality  is  not  required. 

In  the  least  desired  situation  depicted  by  Figure  HI-18,  the  data  is  again  far 
from  Gaussian.  Although  two  vertical  bars  are  now  shown  for  each  class,  indicating  poor 
data  classification,  all  mappings  are  precisely  to  one  of  the  extreme  values. 
Consequently,  mapping  into  the  projection  space  did  not  result  in  data  overlap  and  the 
assumption  of  normality  is  again  not  required. 

In  the  intermediate  cases  shown  on  Figures  HI- 19  and  HI-20,  it  is  apparent 
that  the  transformation  into  the  decision  space  was  not  precise.  As  a  result  data  overlap 
may  occur.  In  three  of  the  four  cases  shown  (both  classes  of  Figures  EH- 19  and  class  7t2 
of  Figure  HI-20)  the  distributions  are  nearly  normal,  so  the  initial  assumption  holds.  For 
class  7ti  of  Figure  HI-20,  however,  the  normality  plot  indicates  that  the  tail  of  the 
distribution  extends  further  out  than  that  of  a  normally  distributed  data  set.  This  implies  a 
greater  amount  of  data  overlap  than  assumed  by  a  Gaussian  distribution.  Fortunately,  this 
situation  is  atypical.  Because  of  the  logsig  activation  function,  the  input  data  tends  to 
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Figure  III-17.  Example  of  Projected  Data  Distribution,  (a)  Class 

Normality  Plot  (b)  Class  n2  Normality  Plot  (c)  Class  7ix 
Histogram  (d)  Class  n2  Histogram. 


Probability 


Figure  III-20.  Example  of  Projected  Data  Distribution,  (a)  Class  % 
Normality  Plot  (b)  Class  n2  Normality  Plot  (c)  Class  % 
Histogram  (d)  Class  ti2  Histogram. 
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map  to  one  of  the  optimum  values  (i.e.,  10  or  -10).  Yet,  to  compensate  for  this  aberrant 
case,  stringent  requirements  will  be  placed  on  VMR. 

Accepting  the  assumption  of  a  normally  distributed  data  projection,  the 
derivation  for  VMR  is  as  follows.  For  the  two  classes  shown  in  Figure  ID-16,  the 
projection  of  class  jcj  has  a  mean  pi  and  standard  deviation  ci.  Correspondingly,  the 
projection  of  class  7t2  has  a  mean  P2  and  standard  deviation  02.  Unlike  in  the  feature 
space,  the  class  means  and  standard  deviations  are  scalar  quantities  owing  to  the  one¬ 
dimensional  projection  by  the  neural  network’s  linear  and  non-linear  mappings. 

Taken  from  error  function  tables,  the  error  tolerance  specifies  the  location 
of  xi  and  x2  on  the  projection  axis.  For  instance,  with  an  allowable  error  set  at  0.5%,  the 
threshold  points  for  a  zero-mean,  unit- variance,  normally  distributed  class  are  ±2.52  units 
from  the  mean.  That  is,  0.5%  of  the  distribution  reside  in  the  tails  beyond  these  locations. 
Applying  the  known  statistical  parameters  of  the  actual  classes,  these  positions  are  found 
to  be 


Xa  =  |ia  +  2.52  •  0a,  (3.32) 

Xb  =  p*  +  2.52  •  Ob.  (3.33) 

In  Equations  3.32  and  3.33,  the  subscripts  a  and  b  are  used  to  derive  the 
formulae  without  having  knowledge  of  the  actual  orientation  of  classes  tcj  and  %2.  In  the 
general  sense,  subscript  b  refers  to  the  class  with  the  more  positive  mean.  So,  in  terms  of 
Figure  ID- 16,  xa  corresponds  to  xi;  xb  to  x2.  Taking  the  difference  of  xb  and  xa  yields  AV: 


A  V  =  Xb  -  Xa 

=  (pb  -  2.52  •  Ob)  -  (pa  +  2.52  •  0a) 

=  (pb  -  pa)  -  2.52(<Jb  +  0a). 

Using  Equation  3.34,  the  variance-mean  ratio  (VMR)  can  be  expressed  as 

VMR  -  AV  -  O-M*)  —  2.52(0b  +  0a) 

AM  pb-pa 

_  1  _  2.52(0b  +  0a) 

Pb-pa 


(3.34) 


(3.35) 
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To  account  for  cases  in  which  improper  class  assignment  results  in  the  mean  of  class  a 
being  more  positive  than  the  mean  of  class  b,  an  absolute  value  is  introduced  to 
emphasize  the  magnitude  and  not  the  sign  of  the  difference  in  means.  Equation  3.35 
therefore  becomes 


VMR  =1- 


2.52(0b  +  0a) 
Pb-pa| 


(3.36) 


If  Equation  3.36  had  been  incorrectly  derived,  the  second  term  would  have 
been  added  instead  of  subtracted. 

Equation  3.36  establishes  how  tightly  clustered  the  class  projection  into 
the  decision  space  must  be.  Recognizing  that  a  VMR  of  zero  would  only  incur  the 
acceptable  error  limit  (here,  0.5%  error)  for  a  Gaussian  distributed  data  sample,  a  VMR 
greater  than  zero  imparts  an  even  higher  requirement  on  projected  class  variance.  This 
compensates  for  any  situations  in  which  the  data  distribution  is  not  Gaussian  and 
institutes  the  precision  required  of  the  neural  network  training.  Caution  must  be  observed 
for  negative  VMR  values.  This  implies  a  mean  separation  that  is  smaller  than  the  sum  of 
variances  and  hence,  a  large  degree  of  overlap. 

During  actual  implementation,  VMR  terminated  the  training  cycle  only 
after  an  improvement  in  MD2  (i.e.,  a  more  negative  value).  In  retrospect,  however, 
checking  MD2  was  not  required.  Since  this  modification  considers  both  mean  separation 
and  projection  variance,  an  increase  in  mean-difference  (MD2)  does  not  necessarily 
indicate  worsening  conditions,  as  it  does  for  the  mean-difference  (MD)  of  the  standard 
MSNN.  Consequently,  network  training  should  have  been  stopped  on  VMR  threshold, 
maximum  epoch  limit,  or  minimum  learning  rate,  without  consideration  for  the  MD2 
projection  index. 

3.  Further  Implementation  of  the  Variance-Mean  Ratio 

Perhaps  the  strength  of  projection  space  normalization  modification  does  not  lie 
in  the  upgraded  performance  parameter,  MD2,  as  originally  intended,  but  rather  in  the 
termination  parameter,  VMR.  Because  of  this  possibility,  the  third  MSNN  variation  used 
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VMR,  vice  the  empirically  determined  ninety-percent  of  optimal  MD,  as  the  training 
termination  requirement  for  the  original  MSNN  method. 

E.  SUMMARY 

Chapter  HI  discussed  several  techniques  used  to  classify  observations.  These 
methods  include  a  parametric  statistical  classifier  and  five  neural  network  architectures. 
The  statistical  classifier  of  interest  was  a  quadratic  classifier.  The  decision  rule  for  this 
method  was  derived  and  its  applicability  to  normally  distributed  data,  highlighted. 

The  first  neural  network  examined  was  the  single  layer  perception.  This  neuro¬ 
classifier  used  linear  separation  boundaries  to  partition  classes  into  their  own  separate 
spaces.  The  primary  difficulty  encountered  with  the  perception  networks  was  the 
inability  to  use  optimization  techniques  to  guide  the  network’s  training.  Instead  a  simple, 
albeit  powerful  under  certain  situations,  rule  governs  perception  learning. 

Next,  the  Mean  Separator  Neural  Network  (MSNN)  first  introduced  by  Duzenli 
and  Fargues  was  explained.  This  network  architecture  and  variations  on  its  design  are  the 
principle  focus  of  this  study.  Classification  with  MSNN  are  performed  by  projecting  data 
onto  an  one-dimensional  axis.  The  mean-difference  (MD)  performance  parameter 
maximizes  the  separation  between  class  mean  values,  enabling  classification  of 
observations  to  the  proper  category  by  using  a  distance  metric. 

Improved  performance  was  sought  by  modifying  the  MSNN  to  consider  the  data 
variance.  One  alternative  mean-separator  normalized  the  input  space  in  an  attempt  to 
produce  tight  class  clusters.  A  second,  more  promising,  approach  normalized  the 
projection  space  using  an  upgraded  performance  parameter,  MD2,  and  a  new  training 
termination  criteria,  VMR.  Together,  these  metrics  maximized  the  projected  mean 
separation  while  also  tightening  the  decision  space  data  spread,  reducing  data  cluster 
overlap.  Hypothesizing,  however,  that  the  primary  driver  to  restricting  this  overlap  was 
the  termination  parameter,  VMR,  and  not  the  modified  performance  parameter,  MD2, 
classification  using  the  standard  MSNN  projection  index,  MD,  coupled  with  the  new 
termination  criteria  was  considered  as  a  third  modification  to  the  MSNN.  In  the 
following  chapters,  these  MSNN  variants  -  MSNN  with  preconditioned  input  space, 
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IV.  VERIFICATION  OF  CLASSIFIER  PERFORMANCE 


Chapter  HI  introduced  and  explained  the  implementation  of  the  different 
classifiers  considered  in  this  study;  one  parametric  classifier  and  five  neural  networks. 
This  chapter  assesses  these  methods  through  simulations.  MATLAB  program  codes  used 
during  these  trials  are  provided  in  Appendix  C. 

A.  SIMULATION  PROTOCOL 

A  three-class  separation  problem  was  considered  to  test  the  performance  of  the 
subject  classification  methods.  Working  in  three-,  ten-,  and  fifty-dimension  input  spaces, 
the  classifiers  used  100  training  objects  per  class  to  model  the  data  and  then  used  this 
representation  to  categorize  1000  trial  observations  per  class.  Performing  the  tests  under 
various  noise  conditions  emphasized  the  robustness  of  the  classification  methods. 
Specifically,  the  signal-to-noise  ratios  (SNRs)  simulated  were  ±20  dB,  ±15  dB,  ±10  dB, 
±5  dB,  and  0  dB.  Absent  from  this  list  is  the  no-noise  case  since  generation  of  zero- 
variance  data  would  identify  only  one  point  for  each  class. 

Constructing  the  training  and  testing  data  objects  required  determining  class 
statistics.  The  mean  values  for  each  class  feature  were  randomly  selected  from  a  uniform 
distribution.  To  focus  the  initial  neural  network  activity  in  the  logsig  dynamic  range  and 
thereby  prevent  neural  network  saturation,  these  mean  values  were  constrained  to  [-1,1]. 
During  real-time  analysis,  signal  power  is  normalized.  Hence,  the  normalized  sum  of  n 
feature  variances  gives  signal  SNR,  as  shown  by  Equation  4.1: 


SNR  =  lOlog 


(4.1) 


Consequently,  when  SNR  is  known  Equation  4.1  can  be  used  to  randomly  select  each 
feature  variance  from  a  uniform  distribution. 

Having  randomly  specified  the  mean  and  established  the  variance  values  for  each 
class,  Gaussian  distributed  features  were  simulated  to  form  the  300  training  and  3000 
testing  observations  (100  and  1000  for  each  class)  required  per  trial.  Examples  of  a 
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three-class,  three-feature  classification  problem  with  low  and  high  noise  conditions  are 
illustrated  in  Figure  IV- 1  and  IV-2.  The  two-dimensional  plots  in  each  figure  depict  data 
projection  onto  two  of  the  three  dimensions.  As  expected,  decreased  SNR  resulted  in 
increased  data  overlap,  thereby  suggesting  increased  classification  difficulty. 


Figure  IV- 1.  Example  of  3-Feature  Data  for  Classification  (low  noise). 


Figure  IV-2.  Example  of  3-Feature  Data  for  Classification  (high  noise). 
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Lastly,  after  creating  the  artificial  feature  vector,  the  data  was  normalized  for 
MSNN  Mod  1  implementation  (as  specified  by  Equation  3.28)  and  the  training  data 
covariance  matrix  was  calculated  for  use  by  the  statistical  classifier.  The  results  obtained 
with  this  parametric  classifier  are  considered  next. 

B.  INDIVIDUAL  CLASSIFIER  PERFORMANCE 

1.  Statistical  Classifier 

Chapter  HI  defined  the  quadratic  classifier  decision  rule  as 

di(x)  =  ln|Ei|  +  (x-fii)TE:1(x-ni)-21nPi.  (3.8) 

This  classifier  categorized  testing  objects  by  selecting  the  class  that  resulted  in  the  lowest 
value  for  the  distance  quantifier.  The  observations  x,  covariance  matrix  £,  and  mean 
vector  p.  were  obtained  as  earlier  explained.  The  a  priori  probability.  Pi,  was  determined 
by  assuming  equal  likelihood  for  all  class  types;  P  =  1/m,  with  m  being  the  number  of 
classes. 

Recall  a  crucial  assumption  made  during  the  derivation  of  Equation  3.8  required 
that  the  observations  x  form  a  normally  distributed  data  set.  The  trials  met  this 
prerequisite  by  using  a  normally  distributed  random  generator  to  produce  the  artificial 
signal  features.  Since  these  random  variables  were  created  without  interdependence  and 
are  therefore  uncorrelated,  the  joint  distribution  of  the  random  variables  is  a  product  of 
the  individual  distributions.  Hence,  the  observations  are  multivariate  normal,  indicating 
the  quadratic  classifier  can  be  used. 

Convinced  that  the  quadratic  classifier  can  be  appropriately  applied,  3000  test 
objects  per  trial  were  classified.  For  all  combinations  of  the  nine  SNR  levels  and  three 
input  space  sizes,  five  trials  were  conducted.  This  amounts  to  the  classification  of 
405,000  test  objects.  For  convenience,  the  simulation  results  obtained  for  this  and  all 
other  classifiers  are  collected  in  Appendix  B.  Tables  B-l  through  B-3  contain 
classification  confusion  matrices  of  the  statistical  classifier  trials  and  Figure  B-l  plots  the 
performance  indices  indicated  by  these  tables.  These  results  indicate  that  the  quadratic 
classifier  performed  remarkably  well  under  the  simulated  conditions.  As  expected, 
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misclassification  decreased  with  increased  SNR  and  feature  space  size.  A  comparison  of 
all  classification  techniques  will  be  discussed  later. 

2.  Perceptron 

The  quadratic  classifier  models  each  class  based  on  the  statistical  parameters  of 
the  training  data.  The  neural  network  classifiers,  however,  use  a  non-parametric  learning 
algorithm  to  train  the  network  for  class  recognition.  That  is,  the  actual  data,  and  not  its 
distribution  information,  are  used  to  train  the  network  to  differentiate  the  class. 

One  consequence  of  neuro-classifier  training,  however,  is  the  absence  of  a  unique 
solution  in  many  circumstances.  For  instance,  in  the  case  of  the  perceptron  neural 
network,  different  decision  boundaries  arise  dependent  on  the  initial  weight  and  bias 
values.  Recall,  perceptron  training  was  governed  by  the  learning  rules  defined  by 
Equations  3.11  and  3.12: 

wnew  =  wold  +e.pT  =wold  +(t-a).pT  (3.11) 

bnew  _  boia  +e  =  bold  +(t_ a).  (3.12) 

Since  the  update  terms  in  Equations  3.11  and  3.12  are  indirectly  affected  by  the  old 
weight  and  bias  values  through  a,  perturbations  in  the  initial  weight  and  bias  settings  can 
alter  the  final  solution.  In  addition,  there  is  no  way  to  tell  if  an  alternate  weight  and  bias 
will  improve  network  training;  there  is  no  method  to  determine  the  best  starting  point  for 
perceptron  training.  To  account  for  this  uncertainty,  the  perceptron  neural  network  was 
trained  five  times  for  each  set  of  training  data.  For  each  network  re-training,  random 
generation  ensured  different  weight  and  bias  initializations  were  used.  This  process  was 
then  repeated  with  five  different  training  data  sets  to  test  network  durability. 
Consequently,  overall  the  perceptron  was  trained  twenty-five  times  for  each  noise  and 
input  space  condition  to  provide  for  a  more  general  understanding  of  its  capabilities. 

After  each  network  training,  the  perceptron  classified  1000  objects  for  each  class 
per  trial;  in  excess  of  two  million  objects  over  all  simulations.  Tables  B-4  through  B-6 
and  Figure  B-2  summarize  the  results  of  these  trials.  However,  not  all  test  data  was  typed 
to  one  of  the  possible  classes.  As  previously  explained,  this  peculiarity  arises  when  the 


50 


number  of  class  possibilities  (2M  for  a  network  of  jx  processing  elements)  exceeds  the 
number  of  actual  classes.  Table  IV- 1  indicates  the  percentage  of  such  occurrences  for 
each  SNR  level  and  input  feature  size. 
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0 
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-5 
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-10 

14.6 

13.6 

5.1 

-15 

17.7 

16.8 
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14.6 

19.1 

15.7 

Table  IV-1.  Observed  Percentage  of  Perceptron  Non-Type  Classification. 

Tables  B-4  through  B-6  and  IV-1  indicate  acceptable  results  at  positive  SNR  levels,  but 
severely  degraded  perceptron  performance  with  increased  non-type  classifications  in 
noisy  environments.  In  large  part  this  is  attributable  to  the  linear  decision  boundaries 
used  to  separate  the  different  classes.  As  SNR  decreases,  resulting  in  increased  data 
encroachment  into  neighboring  partitions  and  ultimately  more  cluster  overlap,  the 
perceptron’s  linear  separators  cannot  adequately  maintain  class  division.  Consequently, 
classification  performance  suffered. 

3.  MSNN  Methods 

The  quadratic  classifier  and  perceptron  served  as  benchmarks  for  measuring 
MSNN  performance.  For  the  same  reason  that  the  perceptron  was  subjected  to  multiple 
training  cycles,  each  MSNN  variation  was  trained  with  five  different  weight  and  bias 
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initializations  for  each  set  of  100  training  observations  per  class  for  a  three-class  setup. 
To  reiterate,  the  MSNN  alternatives  were 

1.  Standard  MSNN 

2.  MSNN  Mod  1:  MSNN  with  feature  space  preconditioning 

3.  MSNN  Mod  2:  MSNN  with  projection  space  normalization 

4.  MSNN  Mod  3:  Standard  MSNN  with  VMR  termination 

For  the  modifications  that  utilized  the  VMR  termination  parameter  (variations  3  and  4), 
AV  was  based  on  0.5%  of  the  observations  residing  in  the  fringes  of  the  data  distribution 
and  the  VMR  threshold  was  set  at  0.90.  With  these  stringent  criteria,  minimal  data 
overlap  is  expected  when  network  training  secures  on  VMR.  Unfortunately,  a  post¬ 
simulation  record  review  revealed  that  this  was  not  the  case  as  network  training  often 
terminated  on  maximum  epoch  limit. 

Once  trained,  the  tuned  networks  classified  3000  test  objects  per  run.  As 
previously  stated,  this  training/testing  scheme  was  repeated  with  five  different  data  sets  to 
quantify  network  robustness.  Simulation  results  are  presented  on  Tables  B-7  through  B-9 
and  on  Figure  B-3  for  the  standard  MSNN;  on  Tables  B-10  through  B-12  and  Figure  B-4 
for  MSNN  Mod  1;  on  Tables  B-13  through  B-15  and  Figure  B-5  for  MSNN  Mod  2;  and 
on  Tables  B-16  through  B-18  and  Figure  B-6  for  MSNN  Mod  3.  Not  surprisingly,  neural 
network  performance  deteriorated  with  increased  noise  levels  and  decreased  feature  space 
size. 

In  addition  to  these  results,  it  is  also  instructive  to  note  some  characteristics  of  the 
MSNN  implementation  not  pertinent  to  either  the  statistical  classifier  or  perception 
neural  network.  For  instance,  plotting  the  surface  of  the  mean-difference  parameter,  MD, 
over  a  range  of  weight  and  bias  values  provides  insight  into  the  behavior  of  the  network 
training  trajectory.  Unfortunately,  plotting  limitations  prevent  graphical  representations 
of  the  MD  projection  index  and  every  elements  of  the  simulated  feature  space  since  this 
would  require  hyperspace  imaging.  At  most  only  two  degrees  of  freedom  could  be  used 
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to  form  the  three-dimensional  image  of  a  particular  projection  index  surface.  Therefore, 
a  one-dimensional  classification  problem  was  analyzed. 

Figures  IV-3  and  IV-4  illustrate  a  one-dimensional  classification  problem  and  the 
neuron  map  for  its  sole  standard  MSNN  processing  element.  In  particular,  Figure  IV-4 
confirms  successful  network  training,  as  the  test  points  for  each  class  map  to  the  same 
unique  specifier  and  provide  for  maximum  mean  separation. 
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Figure  IV-4.  MSNN  Neuron  Map  of  1-Feature  Data. 


Since  the  feature  space  is  comprised  of  only  one  element,  plotting  the  projection 
index  surface  can  be  achieved  by  considering  a  scalar  weight  and  bias.  This  is  shown  in 
Figure  IV-5.  Here  the  upper  two  graphs  display  the  MD  surface  characteristics  in  the 
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vicinity  of  the  trained  solution  and  representative  contours;  the  lower  two,  a  more  global 
depiction  over  a  wider  range  of  weight  and  bias  values. 


Figure  IV-5.  MSNN  Local  and  Global  Surface  and  Contour  Plots. 


The  MSNN  solution  and  corresponding  mean-difference  rating  of  -400  confirm 
the  successful  network  training  suggested  by  the  network’s  neuron  map.  In  addition,  the 
regularity  of  the  MD  surface  implies  that  network  resolution  to  the  final  weight  and  bias 
values  was  unencumbered  by  any  local  minima  obstacles. 

Recall  that  a  mean-difference  of  zero  is  the  least  desired  case.  Figure  IV-5  shows 
this  occurring  for  a  weight  of  zero  regardless  of  bias,  and  for  large  magnitude  weight  and 
bias  values.  This  latter  case  corresponds  to  processing  element  saturation.  Interestingly, 
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Figure  IV-5  also  suggests  that  in  this  trial  the  bias  was  not  a  vital  contributor  to  obtaining 
the  optimal  MD  value.  Both  the  local  and  global  plots  reveal  that  a  MD  value  of  -400 
can  be  attained  with  a  relatively  small  bias.  This,  however,  is  primarily  a  function  of  the 
class  data  and  not  a  general  trait  of  mean  separator  transformation  (Equation  3.14).  In  all 
one-dimensional  cases  examined,  the  class  means  were  bipolar.  That  is,  the  means  of  the 
data  distributions  were  created  such  that  they  had  opposite  sign.  Consequently,  the 
inherent  data  distribution  bias  (i.e.,  combined  mean  of  the  two  classes)  was  near  zero, 
indicating  little  need  to  impose  an  external  bias  to  maximize  mean  separation. 

Yet,  in  general,  examination  of  the  mean  separator  transformation  suggests  that 
the  role  of  the  bias  is  as  a  linear  translator  of  the  activation  function  output.  The  bias 
merely  shifts  the  characteristic  logsig  plot  horizontally.  Consequently,  bias  can  be 
disregarded  and  in  its  place,  a  second  weight  component  considered.  By  considering  this 
second  weight  feature,  greater  insight  into  the  presence  or  absence  of  local  minima  and 
subsequently  their  effect  on  neural  network  performance  can  possibly  be  gained.  Figures 
IV-6  (low  noise)  and  IV-7  (high  noise)  illustrate  such  a  two-dimensional  problem.  The 
neuron  maps  (Figures  IV-8  through  IV-22,  even)  and  mean-difference  surface  and 
contour  plots  (Figures  IV-9  through  IV-23,  odd)  for  the  four  MSNN  variants  follow. 
From  these  figures,  it  is  worth  noting  the  consistency  (or  lack  thereof)  in  the  neuron  maps 
and  any  eccentricity  in  the  shape  of  the  surface  plots. 


Figure  IV-6.  Example  of  2-Feature  Data  for  Classification  (low  noise). 
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Figure  IV-7.  Example  of  2-Feature  Data  for  Classification  (high  noise). 


For  instance,  Figures  IV-10  and  IV-18  suggest  the  futility  of  data  preconditioning 
prior  to  network  training  and  classification.  MSNN  Mod  1  consistently  produced  the 
least  consistent  neuron  mappings  and  often  the  smallest  mean  spread.  Further  confirmed 
by  low  mean-difference  indices  of  -134  and  -174  respectively  shown  on  Figures  IV-11 
and  IV-19,  the  resulting  sub-optimal  mean  separation  led  to  poor  classification 
performance. 

On  the  other  hand,  the  neuron  maps  and  surface/contour  plots  for  the  remaining 
three  MSNN  variants  indicate  optimal  network  training  achieved  with  the  high  SNR 
condition.  Figures  IV-8,  IV-12,  and  IV-14  depict  the  maximal  separation  between  class 
means  and  Figures  IV-9,  IV-13,  and  IV-15  report  the  optimal  value  for  the  mean- 
difference  projection  index.  For  the  standard  MSNN  and  MSNN  Mod  3,  this  MD  value 
is  given  by  Equation  3.15;  for  MSNN  Mod  2,  MD2  is  calculated  using  Equation  3.29. 
Moreover,  the  MSNN  Mod  2  mean-difference  value  of  -1010  implies  a  sum  of  projection 
space  variances  much  less  than  10'7,  suggesting  that  transformation  into  the  decision 
domain  resulted  in  a  high  degree  of  precision  and  essentially  no  data  overlap. 
Graphically,  this  accounts  for  the  vertical  slope  found  on  the  performance  surface  of 
Figure  IV-13,  as  opposed  to  the  more  gradual  descents  seen  on  other  plots.  Such  a 
favorable  mapping  greatly  simplifies  the  classification  task. 
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Figure  IV-8.  MSNN  Neuron  Map  of  2-Feature  Data  (low  noise). 


Figure  IV-9.  MSNN  Local  and  Global  Surface  and  Contour  Plots  (low  noise). 


57 


Figure  IV-10.  MSNN  Mod  1  Neuron  Map  of  2-Feature  Data  (low  noise). 
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Figure  IV-13.  MSNN  Mod  2  Local  and  Global  Surface  and  Contour  Plots  (low  noise). 


MD 


Figure  IV-14.  MSNN  Mod  3  Neuron  Map  of  2-Feature  Data  (low  noise). 


Figure  IV-15.  MSNN  Mod  3  Local  and  Global  Surface  and  Contour  Plots  (low  noise). 
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Figure  IV-21.  MSNN  Mod  2  Local  and  Global  Surface  and  Contour  Plots  (high  noise). 


Figure  IV-22.  MSNN  Mod  3  Neuron  Map  of  2-Feature  Data  (high  noise). 


The  superior  performance  of  these  MSNN  variants  relative  to  the  MSNN  Mod  1 
approach  is  also  displayed  on  the  figures  representative  of  high  noise  conditions. 
Moreover,  these  plots  illustrate  the  effect  of  added  noise.  The  wide  range  global  plots 
indicate  that  by  increasing  the  noise  level,  the  area  of  optimal  mean-difference  decreases. 
For  instance,  consider  the  results  of  MSNN  Mod  2  shown  on  Figures  IV-13  and  IV-21. 
Whereas  the  optimal  region  envelops  a  large  area  in  the  low  noise  case;  with  increased 
noise  corruption,  maximal  MD2  can  only  be  attained  through  a  narrow  selection  of  weight 
values.  Since  fewer  weight  combinations  will  result  in  the  optimal  MD2  value,  the 
likelihood  of  attaining  an  acceptably  trained  network  is  lower.  Consequently,  more 
misclassifications  are  probable. 

Also  notice  that  the  low  SNR  plots  indicate  a  greater  directionality  towards  a 
particular  weight  component,  reminiscent  of  what  was  observed  in  the  one-dimensional 
case.  But,  unlike  the  earlier  observation,  this  is  not  a  result  of  the  simulation  protocol 
(i.e.,  creating  intrinsically  low  bias  conditions).  For  the  two-dimensional  case,  this 
directionality  results  from  the  inner  product  of  the  weight  vector  and  actual  data  used, 
and  therefore  will  change  from  simulation  to  simulation. 

Curiously,  the  results  obtained  with  the  MSNN  Mod  3  were  exactly  the  same  as 
those  achieved  by  the  standard  MSNN.  Recall  the  principle  advantage  of  using  the  VMR 
termination  criteria  is  that  this  parameter  places  a  requirement  on  projection  data  variance 
in  addition  to  projection  mean  spread.  By  considering  both  parameters,  data  overlap  is 
minimized.  Unfortunately,  network  training  often  did  not  secure  on  reaching  the  VMR 
threshold.  Instead,  the  MSNN  Mod  3  variant  terminated  the  training  phase  when  the 
number  of  training  epochs  exceeded  the  established  limit.  Because  of  this,  future  MSNN 
studies  should  increase  the  epoch  limit  and  reformulate  the  network  guidance  (i.e.,  the 
learning  rate  rules)  to  take  advantage  of  the  VMR  criterion  while  still  allowing  for  a 
dynamic  learning  capability. 

Analysis  thus  far  has  focused  on  the  performance  of  the  individual  classification 
methods.  The  next  section  compares  the  six  classification  tools. 
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C.  CLASSIFIER  COMPARISON 


Analysis  of  the  classification  techniques  provided  initial  insight  into  their 
capabilities.  The  most  revealing  fact  learned,  however,  does  not  concern  the  benefits 
gained  by  a  specific  method,  but  instead  speaks  to  the  ineffectiveness  of  one  under  the 
prescribed  test  conditions.  The  inability  of  MSNN  Mod  1  (preconditioned  input  data)  to 
satisfactorily  classify  data  objects  was  most  notable  on  neuron  mapping  plots  of  the  input 
observations  into  the  decision  space  (Figures  IV-10  and  IV-18).  These  figures  showed 
imprecise  projection  of  the  input  data. 

The  results  of  each  classifier  must  be  compared  to  determine  if  the  neural  network 
modification  improved  classification  performance.  Unfortunately,  Figures  IV-8  through 
IV-23  and  Appendix  B  do  not  facilitate  performance  comparison  of  the  six  classification 
techniques.  This  contrast,  however,  can  be  gleaned  by  fusing  the  information  found  on 
Figures  B-l  through  B-6  into  three  plots  differentiated  by  input  space  size,  shown  as 
Figures  IV-24  through  IV-26.  For  the  purposes  of  this  evaluation,  reliable  classification 
capabilities  are  demonstrated  at  each  SNR  level  if  the  average  correct  classification 
percentage  exceeds  ninety-percent. 

Using  this  standard,  the  statistical  classifier  achieved  the  most  accurate  level  of 
performance.  For  a  small  feature  space,  the  parametric  classifier  attained  over  ninety- 
percent  accuracy  at  a  SNR  of  7  dB.  As  input  space  dimensionality  increased  to  fifty 
features,  this  performance  level  was  maintained  for  all  SNRs.  This  high  classification 
success  can  be  attributed  to  the  classifier’s  ability  to  minimize  classification  error,  as 
alluded  to  in  Chapter  m.  Since  the  artificial  features  were  normally  distributed  and 
independently  created,  the  data  set  was  well  conditioned,  allowing  for  optimum 
performance  of  the  statistical  classifier. 

For  the  MSNN  variants,  Figures  IV-24  through  IV-26  do  not  clearly  indicate 
which  technique  performs  best.  The  greatest  distinction  is  discemable  in  the  three- 
feature  input  space.  As  shown  on  Figure  IV-24,  there  is  little  difference  between  the 
performance  of  the  standard  MSNN  and  MSNN  Mod  2,  with  each  maintaining  the 
ninety-percent  accuracy  level  down  to  5  and  6  dB,  respectively.  MSNN  Mod  3  met  this 
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limit  at  1 1  dB  and  then  paralleled  the  standard  MSNN  and  MSNN  Mod  2  algorithms  with 
a  slight  offset.  Not  unexpectedly,  MSNN  Mod  1  proved  to  be  the  least  successful 
technique,  with  all  SNRs  resulting  in  sub-ninety-percent  accuracy. 


figure  IV-25.  Performance  Comparison:  Simulated  Features  (10). 


Figure  IV-26.  Performance  Comparison:  Simulated  Features  (50). 
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In  general,  as  the  number  of  input  features  increased,  all  classifiers  showed 
greater  classification  success.  Moreover,  MSNN  Mod  1  surprisingly  showed  improved 
performance  equal  to  the  standard  MSNN  and  MSNN  Mod  2  methods  in  the  ten-  and 
fifty-dimension  feature  spaces.  With  these  feature  space  dimensionalities,  ninety-percent 
classification  accuracy  was  sustained  down  to  0  dB  and  -7  dB,  respectively,  for  the  three 
MSNN  variants  listed. 

Curiously,  the  MSNN  Mod  3  variant  demonstrated  the  least  amount  of 
improvement.  For  instance.  Figure  IV-26  indicates  twenty-percent  disparity  between  this 
hybrid  method  and  the  standard  MSNN  at  SNRs  of  -5  dB  and  -10  dB.  This  difference 
and  lack  of  significant  improvement  can  again  be  attributed  to  MSNN  Mod  3  terminating 
its  training  on  maximum  epoch  limit  instead  of  on  VMR  threshold.  Unlike  the  standard 
MSNN  that  re-initializes  its  weights  and  bias  and  retrains  the  network  when  network 
learning  ceases  prior  to  satisfactorily  training,  MSNN  Mod  3  implements  the  weight  and 
bias  it  had  attained  when  a  termination  parameter  setpoint  is  reached.  Since  acceptable 
network  training  may  not  have  been  achieved,  poor  classification  performance  would 
results. 

With  an  input  space  dimensionality  of  three,  the  perceptron  performed  on  par  with 
the  MSNN  Mod  3  variant  to  15  dB.  Below  this  SNR  level,  perceptron  performance 
decline  can  be  accredited  to  greater  data  noise;  the  resulting  increased  data  overlap 
limiting  the  network’s  ability  to  establish  linear  class  boundaries. 

D.  SUMMARY 

Chapter  IV  utilized  simulated  data  consisting  of  artificial  feature  elements  to 
measure  classification  method  performance.  Considering  varying  noise  and  input  space 
size,  data  sets  of  300  training  and  3000  testing  objects  were  created.  For  the  statistical 
classifier,  ten  such  data  sets  were  created  for  all  combinations  of  SNR  and  feature  space 
size.  For  the  neural  network  trials,  five  data  sets  were  simulated.  In  addition,  because  of 
a  dependence  on  weight  and  bias  initialization,  the  neural  networks  processed  each  set  of 
observations  from  five  different  starting  conditions. 
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Considering  the  empirical  results  compiled  on  Figures  IV-24,  the  statistical 
classifier  attained  the  greatest  level  of  classification  success.  The  standard  MSNN 
algorithm  and  MSNN  Mod  2  were  the  next  most  successful,  followed  by  MSNN  Mods  1 
and  3.  At  high  SNR,  perceptron  performance  was  comparable  to  the  other  classifiers;  but 
at  increased  noise  levels,  dropped  off  precipitously. 

Results  for  ten-  and  fifty-feature  input  spaces  are  also  shown  as  Figures  IV-25  and 
IV-26.  Due  to  increased  dimensionality,  all  classifiers  performed  equally  well.  In  those 
instances  where  the  performance  of  the  different  classifiers  deviated,  classification  levels 
were  below  ninety-percent.  Therefore,  comparison  of  the  methods  is  inconsequential 
since  all  would  be  considered  unacceptable. 

Overall,  Chapter  IV  sought  to  establish  classifier  feasibility.  Disappointingly,  the 
trial  simulations  did  not  show  a  significant  difference  between  the  MSNN  variants 
studied.  The  next  chapter  attempts  to  make  this  distinction  by  examining  near  real  world 
application  of  these  methods  through  simulation  and  classification  of  modulated 
communication  signals. 
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V.  CLASSIFICATION  OF  MODULATED  SIGNALS 


The  intent  of  this  thesis  is  to  demonstrate  the  robustness  of  the  MSNN  variants  in 
classifying  data  to  the  appropriate  signal  class.  In  Chapter  IV,  the  performance  of  these 
neuro-classifiers,  as  well  as  that  of  a  quadratic  statistical  classifier  and  a  perceptron 
neural  network,  were  evaluated  based  on  the  accuracy  attained  in  categorizing  random 
vectors  composed  of  artificially  simulated  features.  In  this  chapter,  these  classification 
tools  will  be  used  to  separate  data  objects  consisting  of  features  extracted  from  synthetic 
communication  signals.  The  process  of  feature  extraction  is  introduced  prior  to 
discussing  the  experimental  procedure  and  simulation  results.  MATLAB  program  codes 
used  during  these  trials  are  presented  in  Appendix  C. 

A.  FEATURE  EXTRACTION 

By  identifying  the  class  to  which  a  signal  belongs,  classification  tools  convert 
data  to  information,  freeing  the  operator  from  the  tedium  of  manually  associating  objects 
to  class.  Such  processes  consequently  enable  the  military  commander  to  gamer 
knowledge  and  wisdom  efficiently,  thereby  allowing  him  to  more  effectively  interpret, 
predict,  and  appropriately  respond  to  the  environment.  In  short,  these  classification  tools 
increase  his  situational  awareness  and  improve  his  decision-making  capability. 

However,  automating  such  capabilities  is  not  a  trivial  endeavor.  This  thesis  has 
identified  and  demonstrated  tools  that  facilitate  information  and  knowledge  management, 
but  has  neglected  to  specify  how  in  real-world  applications  the  observation  vectors  would 
be  obtained.  Indeed,  “a  major  problem  in  the  area  of  modulation  recognition  is  the 
choice  of  distinctive  marks  for  distinguishing  between  the  different  types  of  modulation 
without  knowledge  of  modulation  parameters”  (Reichert,  1992,  p.221). 

In  trying  to  determine  the  extraction  method  to  employ,  most  techniques  avoid 
time-domain  features  because  they  have  been  shown  to  lack  robustness  at  low  SNR 
(Ghani  and  Lamontagne,  1993,  p.  111).  A  noteworthy  exception  to  this  may  be  the 
exploitation  of  hidden  periodicities  found  in  cyclostationary  signals.  As  recognized  by 
Reichert,  attributes  of  the  complex  envelope  of  linearly  modulated  signals,  when  mapped 
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to  a  single  power  spectral  line  by  an  appropriate  transformation,  uniquely  identify  the 
underlying  modulation  type.  Moreover,  this  method  is  robust  in  noisy  environments 
since  uncorrelated  noise  will  not  add  spectral  lines  that  could  be  read  as  modulated 
signal.  (Reichert,  1992) 

In  another  approach,  the  features  of  interests  were  counts  falling  into  subdivisions 
of  the  signal  plane.  Conceptually,  this  gives  an  empirical  distribution  of  the  observed 
data.  Then  using  a  distance  metric,  the  Hellinger  distance,  this  distribution  can  be 
compared  to  known  signal  densities.  The  signal  corresponding  to  the  lowest  distance 
measure  is  chosen  as  the  class  type  of  the  observations.  (Huo  and  Donoho,  1998) 

Despite  interest  in  these  techniques,  their  incompatibility  with  neural  networks 
and  mathematical  complexity  precluded  implementation  in  this  study.  So  instead, 
spectral  characteristics  were  used. 

Several  studies  have  utilized  spectral  coefficients  as  features  for  classification. 
Duzenli  used  time-frequency  characteristics  obtained  through  wavelet  decompositions  to 
categorize  underwater  signals  (Duzenli,  1998),  while  others  used  Fourier  transform 
coefficients  for  analysis  (Ghani  and  Lamontagne,  1993),  (Lallo,  1999).  This  thesis  also 
extracted  features  from  the  Fourier  domain.  The  creation  of  these  simulated  signals  and 
.execution  of  empirical  trials  is  discussed  next. 

B.  SIGNAL  SIMULATION 

1.  Signal  Construction 

The  signal  plane  consisted  of  three  communication  modulation  types  corrupted  by 
varying  degrees  of  additive,  white  Gaussian  noise.  The  model  for  constructing  these 
signal  realizations  is  represented  by  Equation  5.1  as 

x(t)  =  s(t)  +  n(t),  (5.1) 

with  s(t)  being  the  uncorrupted  signal;  n(t),  the  additive  white  Gaussian  noise  component; 
and  x(t),  the  corrupted  signal.  Specifically,  the  three  signal  classes  simulated  were  binary 
amplitude  shift  keying  (2-ASK),  binary  phase  shift  keying  (2-PSK),  and  binary  frequency 
shift  keying  (2-FSK).  The  governing  equations  for  these  signal  types  are 


72 


SAsx(t)  =  -^=rSin(27ifct) 

for  0  <  t  <  T 

(5.2) 

SPSK(t)  =  ^ 

;psin(27tfct  +  (pic) 

for  0  <  t  <  T 

(5.3) 

SFSK(t)  =  J 

5sin(27i(fc  +  Afk)t) 

for  0  <  t  <  T. 

(5.4) 

rri  -7 

All  signal  types  had  a  carrier  frequency,  fc,  of  40  MHz  and  a  signal  bit  period,  T,  of  10' 
seconds,  resulting  in  four  cycles  per  message  bit.  Sampling  the  continuous  signal  at  500 
MHz  gives  a  discrete  time  representation  of  12.5  samples  per  cycle  or  50  samples  per.  bit. 

Different  signal  realizations  were  then  constructed  by  encoding  random  baseband 
binary  messages  with  the  different  modulation  types.  For  2-ASK,  the  random  message 
determined  if  the  signal  amplitude,  Ak,  was  zero  or  one.  For  2-PSK,  the  random  message 
determined  if  the  phase  shift,  4>k,  was  zero  or  n  radians.  For  2-FSK,  the  random  message 
determined  if  the  adjacent  frequency  spacing ,  Afk,  was  zero  or  10  MHz.  The  normalized 
sum  of  squares  over  all  time-domain  components  then  furnished  the  signal  power  of  each 
realization.  Using  this  signal  power,  the  noise  power  for  the  desired  SNR  level  was 
determined  according  to 


f  D  A 


SNR  =  lOlogid 


V  Pn  j 


(5.5) 


and  added  to  the  signal  realization  (Equation  5.1).  As  with  the  artificial  feature 
simulations,  SNRs  of  ±20  dB,  ±15  dB,  ±10  dB,  ±5  dB,  and  0  dB  were  considered,  as  well 
as  a  no-noise  case.  The  final  signal  representation  for  each  realization  was  attained  by 
normalizing  each  corrupted  signal  by  its  overall  power  level. 

To  extract  the  features  needed  for  classification,  the  time-domain  signals  were 
projected  into  the  Fourier  domain  where  the  spectral  coefficients  directly  relate  to  the 
signal’s  power  spectral  density.  To  identify  the  needed  signal  characteristics,  two 
techniques  were  attempted.  The  more  general  approach  identified  a  signal’s  largest 
spectral  component  and  extracted  those  frequencies  whose  coefficients  exceeded  a  certain 
percentage  of  this  maximum  value.  Repeated  for  100  training  realizations  of  each  signal 
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type,  the  common  frequencies  from  this  set  of  feature  vectors  specified  the  identifying 
attributes  for  each  signal  class.  A  compilation  of  these  class  characteristics  provided  the 
final  feature  set  and  dimensionality  for  the  signal  space.  The  training  and  testing  data 
objects  of  each  class  would  utilize  this  full  description  of  the  signal  space,  and  not  just 
the  features  initially  selected  for  the  individual  class  type. 

Unfortunately,  this  method  proved  unreliable.  Often  one  or  two  components  may 
typify  a  certain  class,  while  thirty  or  more  may  be  extracted  from  another.  Because  of 
this  disparity,  the  signal  space  did  not  fairly  distinguish  each  class,  especially  those 
represented  by  a  small  number  of  attributes.  Hence,  a  more  rigid  feature  extraction 
scheme  was  considered. 

Previous  studies  had  ascertained  that  the  information  needed  to  discriminate 
different  modulation  types  was  contained  within  a  window  centered  on  the  carrier 
frequency  (Ghani  and  Lamontagne,  1993,  p.  113).  Using  a  1000-point  discrete  Fourier 
transform  and  knowing  the  sampling  frequency,  the  carrier  frequency  was  found  to  reside 
at  bin  80.  For  the  2-FSK  signals,  a  second  predominate  spectral  spike  also  appears  at  50 
MHz,  the  sum  of  the  carrier  frequency  and  adjacent  frequency  spacing;  bin  100. 

Knowing  the  bin  location  of  the  40  MHz  carrier  frequency,  three  schemes  were 
used  to  extract  features  from  the  main  and  first  side  lobes  of  the  spectrum.  In  the  first 
case,  the  fifty-one  spectral  coefficients  from  between  bin  30  and  130  (i.e.,  every  other 
frequency  bin)  were  used  as  the  extracted  features.  The  second  case  used  the  coefficients 
of  every  fourth  frequency;  the  last,  every  tenth  bin.  Respectively,  the  second  and  third 
schemes  constitute  a  signal  space  of  twenty-six  and  eleven  input  variables.  Figures  V-l 
through  V-3,  verify  that  the  selected  spectral  components  do  distinguish  the  three  signal 
classes.  Taken  for  the  eleven  feature  signal  space,  these  time  and  spectral  representations 
of  noise-free  simulated  communication  signals  specifically  show  that  2-ASK  has  more 
spectral  energy  concentrated  in  the  carrier  frequency  than  2-PSK.  The  spike  at  bin  80  is 
larger  and  the  side  lobes  are  more  subdued  for  2-ASK.  Also,  these  two  modulation  can 
be  separated  from  2-FSK  by  the  absence  of  the  second  frequency  spike  at  bin  100. 
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Figure  V-2.  Simulated  2-PSK  Signal  (no  noise),  (a)  modulated  signal  vs 

sample  number  (b)  enlargement  of  modulated  signal  vs  sample 
number  (c)  spectral  characteristics  vs  frequency  bin  (d) 
extracted  frequency  bins. 
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Figure  V-3.  Simulated  2-FSK  Signal  (no  noise),  (a)  modulated  signal  vs 

sample  number  (b)  enlargement  of  modulated  signal  vs  sample 
number  (c)  spectral  characteristics  vs  frequency  bin  (d) 
extracted  frequency  bins. 

Examples  of  noise-corrupted  signals  are  shown  on  Figures  V-4  through  V-6  for  a 
SNR  of  20  dB,  and  on  Figures  V-7  through  V-10  for  an  SNR  of  10  dB.  In  these  figures, 
plot  (a)  depicts  a  sample  of  the  uncorrupted  normalized  time-domain  signal  versus 
sample  number;  plot  (b),  the  noise-corrupted  version  versus  sample  number.  Plot  (c) 
shows  the  spectral  characteristic  of  the  corrupted  signal  as  a  function  of  frequency  bin, 
while  plot  (d)  displays  the  frequency  bins  chosen  for  an  eleven-feature  input  space. 

In  retrospect,  however,  the  chosen  frequencies  should  have  been  more  judiciously 
selected,  such  as  through  a  principal  component  analysis  or  other  feature  reduction 
method  that  more  compactly  describes  the  signal  space  (Duzenli,  1998),  (Duzenli  and 
Fargues,  1998),  (Fargues  and  Duzenli,  1998),  (Brunzell  and  Eriksson,  1999).  Not  having 
done  so  led  to  inconclusive  results  for  classification  of  noise-corrupted  signals. 

Lastly,  recognize  that  a  rudimentary  communication  signal  model  corrupted  by 
only  additive,  white  Gaussian  noise  was  considered.  More  complex  modulation  schemes, 
multi-path  receptions,  intersymbol  interference,  interlaced  signals,  and  different  fading 


Figure  V-5.  2-PSK  Signal,  (a)  enlargement  of  modulated  signal  vs  sample  number 

(b)  enlargement  of  corrupted  signal  vs  sample  number  (SNR  =  20  dB) 

(c)  spectral  characteristics  vs  frequency  bin  (d)  extracted  frequency  bins. 
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Figure  V-6.  2-FSK  Signal,  (a)  enlargement  of  modulated  signal  vs  sample  number 

(b)  enlargement  of  corrupted  signal  vs  sample  number  (SNR  =  20  dB) 

(c)  spectral  characteristics  vs  frequency  bin  (d)  extracted  frequency  bins. 


(c)  (d) 


Figure  V-7.  2-ASK  Signal,  (a)  enlargement  of  modulated  signal  vs  sample  number 

(b)  enlargement  of  corrupted  signal  vs  sample  number  (SNR  =  10  dB) 

(c)  spectral  characteristics  vs  frequency  bin  (d)  extracted  frequency  bins. 
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Figure  V-8.  2-PSK  Signal,  (a)  enlargement  of  modulated  signal  vs  sample  number 

(b)  enlargement  of  corrupted  signal  vs  sample  number  (SNR  =  10  dB) 

(c)  spectral  characteristics  vs  frequency  bin  (d)  extracted  frequency  bins. 


Figure  V-9.  2-FSK  Signal,  (a)  enlargement  of  modulated  signal  vs  sample  number 

(b)  enlargement  of  corrupted  signal  vs  sample  number  (SNR  =  10  dB) 

(c)  spectral  characteristics  vs  frequency  bin  (d)  extracted  frequency  bins. 
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environments  would  make  for  enhanced  simulation  realism.  In  addition,  other  digital 
signal  types,  such  as  radar,  optical,  and  acoustic,  could  have  been  substituted  for  the  ones 
implemented  here.  These  factors  can  be  explored  in  follow-on  studies. 

2.  Simulation  Protocol 

The  test  procedure  used  to  classify  the  simulated  communication  signals  was  the 
same  as  that  used  for  the  artificial  signal  features.  Using  the  process  described  above, 
100  training  and  1000  testing  data  objects  were  created  for  each  signal  type  per  trial,  with 
the  set  of  simulation  trials  encompassing  all  combinations  of  SNR  and  signal  space  size. 
As  before,  these  feature  vectors  were  normalized  (Equation'3.28)  for  use  by  the  MSNN 
variant  that  required  preconditioned  input  data  (MSNN  Mod  1)  and  the  covariance 
matrices  of  the  training  observations  were  calculated  for  use  by  the  statistical  classifier. 

The  statistical  classifier  processed  ten  data  sets  of  300  training/3000  testing 
vectors  each.  For  the  neural  networks,  five  sets  of  realizations  were  created;  but  because 
of  neuro-classifier  dependence  on  initial  conditions,  each  data  set  was  processed  five 
times  with  varying  starting  weights  and  bias. 

Section  V.C  reports  the  findings  of  these  trials. 

C.  SIMULATION  RESULTS 

Results  for  the  communication  signal  simulations  are  detailed  in  Appendix  B, 
Tables  B-19  through  B-36  and  Figures  B-l  through  B-6.  For  Tables  B-19  through  B-36, 
Ttu  Ti2,  and  7t3  refer  to  2- ASK,  2-PSK,  and  2-FSK,  respectively. 

Unlike  the  simulations  conducted  in  Chapter  IV,  the  no-noise  case  could  be 
examined  for  the  synthetic  communication  signals  constructed.  The  results  for  these 
trials  are  included  in  Appendix  B  and  summarized  here  in  Table  V-l.  This  table  indicates 
that  under  no-noise  conditions,  the  standard  MSNN  algorithm  outperformed  all  other 
classifiers,  with  MSNN  Mod  3  being  almost  as  accurate.  In  particular,  Table  V-l  does 
not  substantiate  the  improvements  expected  of  the  MSNN  Mod  2  variant.  It  does, 
however,  corroborate  the  Chapter  IV  findings  of  the  MSNN  Mod  1  variant.  As  before, 
the  input  preconditioning  approach  proved  to  be  the  least  successful  in  classifying  the 
generated  signals.  Chapter  IV  results  also  indicated  that  the  statistical  classifier  most 
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successfully  identified  test  objects.  Table  V-l,  however,  does  not  support  this 
conclusion,  showing  instead  that  the  quadratic  classifier  performed  the  least  accurately. 


Classification  Method 

26  Features 

51  Features 

Statistical  Classifier 

57.0 

33.3 

33.3 

Perceptron 

83.8 

87.8 

92.9 

MSNN 

94.3 

93.8 

94.8 

MSNN  Mod  1 

45.1 

64.4 

63.0 

MSNN  Mod  2 

91.4 

92.2 

92.8 

MSNN  Mod  3 

92.6 

93.1 

94.0 

Table  V-l.  Simulated  Signal  No-Noise  Performance  Results  (Ave  Percent  Correct  Classification). 

To  better  understand  the  decline  in  statistical  classifier  performance  as  well  as  the 
results  obtained  with  noise-corrupted  signals,  it  is  worthwhile  to  revisit  Figures  V-4 
through  V-9.  Although  the  no-noise  representation  of  these  signals  (Figures  V-l  through 
V-3)  clearly  characterize  the  signal  classes,  the  noise-corrupted  plots  show  similarities  in 
the  feature  descriptions  of  the  different  signal  types,  particularly  between  2-ASK  and  2- 
PSK.  Comparing  the  20  dB  realizations  of  Figures  V-4(d)  and  V-5(d),  only  the  center 
frequency  amplitudes  differentiate  the  two  modulation  schemes.  Coefficients  of  the 
remaining  bins  have  approximately  the  same  magnitude.  When  the  2-FSK  signal  is 
considered  (Figure  V-6(d)),  the  only  significant  distinction  between  the  signal  classes 
occur  at  bins  80  and  100,  the  two  carrier  frequencies  of  the  2-FSK  modulation  scheme. 
The  same  observations  apply  to  the  10  dB  examples. 

Now  considering  Figures  B-l  through  B-6,  the  lack  of  distinguishing  features 
between  class  types  explains  the  poorer  results  obtained  with  the  noise-corrupted 
simulated  signal  data  as  compared  to  the  artificial  features  of  Chapter  IV.  The  reduced 
distinction  between  modulation  types  increased  classifier  confusion,  thereby  degrading 
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classification  performance.  Furthermore,  altering  the  signal  space  dimension  did  not 
effect  the  average  correct  classification  percentage  of  the  MSNN  variants  suggesting  that 
the  information  needed  to  separate  the  classes  resided  in  a  smaller  number  of  features 
(Figures  B-3  through  B-6). 

For  the  statistical  classifier,  the  over-parameterized  input  space  illustrates  the 
curse  of  dimensionality  (Bishop,  1995,  p.  7).  Unlike  the  neural  classifiers  that  showed 
improved  performance  (albeit,  marginal)  with  increased  signal  space  size,  the  quadratic 
classifier  exhibited  poorer  results  (Figure  B-l).  These  degraded  results  were  attributed  to 
ill-conditioning  of  the  data  matrix  caused  by  a  linear  dependency  of  the  chosen  features. 
This  supposition  was  verified  by  performing  a  principal  component  analysis  (PCA)  that 
reduced  the  feature  space  size  (Bishop,  1995,  p.  310-311).  Doing  so  resulted  in  the 
improved  classifier  results  of  Table  V-2. 


Features 

No  Noise 

SNR  20  dB 

Retained 

Initial 

Before 

After 

Before 

After 

51 

33.3 

93.5 

75.4 

87.5 

4 

26 

33.3 

93.1 

79.6 

87.0 

11 

56.3 

55.1 

79.0 

81.9 

51 

33.3 

53.0 

75.5 

89.3 

6 

26 

33.3 

44.5 

79.3 

87.1 

11 

61.6 

54.2 

78.8 

81.3 

Table  V-2.  Statistical  Classifier  Performance  Before  and  After  Data 
Conditioning  (Ave  Percent  Correct  Classification). 


Table  V-2  confirms  that  the  signal  space  was  originally  over-parameterized.  In 
nearly  all  cases,  the  percentage  of  correct  classifications  increased,  with  significant  gains 
observed  in  the  no  noise  case  for  feature  reductions  from  fifty-one  and  twenty-six  to  four. 
Only  the  no-noise,  eleven-to-four  or  eleven-to-six  reductions  resulted  in  moderately 
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poorer  results.  The  results  obtained  by  the  eleven-to-four  component  reduction  can  be 
attributed  to  statistical  variance.  It  is  expected  that  conducting  more  trials  would  effect 
no  change  due  to  data  space  conditioning.  For  the  eleven-to-six  reduction,  the  declining 
results  are  caused  by  selecting  a  basis  set  that  increased  the  ambiguity  between  the 
distinct  class  data  distributions,  thereby  incurring  a  loss  of  distinguishing  information. 
But  regardless  of  these  instances,  pre-processing  of  the  input  data  through  PCA 
techniques  generally  improved  statistical  classifier  performance.  Results  validating  this 
enhancement  over  all  SNR  conditions  are  included  on  Figure  B-l. 

Fortunately,  the  signal  space  over-parameterization  that  necessitated  data  pre¬ 
processing  to  obtain  adequate  statistical  classifier  performance  has  less  effect  on  neural 
network  accuracy.  Granted,  judicious  feature  extraction  by  methods  such  as  principal 
component  analysis  improves  neuro-classifier  results;  but  intensive  pre-processing  is  not 
essential  since  non-parametric  classifiers  let  the  “data  speak  for  itself’  (Haykin,  1994,  p. 
23).  In  addition,  the  over-parameterized  feature  space  does  not  favor  any  particular 
neural  network  architectures  and,  hence,  simulation  results  can  be  compared.  Figures  V- 
10  through  V-12  compile  the  data  of  Figures  B-l  through  B-6  to  provide  this  contrast  of 
classifier  capabilities. 

Although  all  noise-corrupted  simulated  signal  trials  were  inadequate  based  on  the 
ninety-percent  correct  classification  criteria  stipulated  in  Chapter  IV,  Figures  V-10 
through  V-12  does  allow  comparison  of  classifier  performance.  For  instance,  these 
graphs  show  that  without  input  data  conditioning  by  eigenvalue  or  other  feature  reduction 
techniques,  the  statistical  classifier  performed  worse  than  all  mean  separator  approaches 
except  MSNN  Mod  1  in  the  signal  spaces  considered.  Using  a  principal  component 
technique  to  reduce  the  input  to  four  features,  however,  improved  the  statistical  classifier 
accuracy  to  the  same  level  as  these  MSNN  methods. 

Figures  V-10  through  V-12  also  show  that  the  perceptron  performed  worse  than 
the  MSNN  variants  in  most  cases.  To  account  in  part  for  this  lower  accuracy,  Table  V-3 
lists  the  percentage  of  perceptron  non-type  classifications  for  each  simulation  trial.  As 
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Ave  Correct  Classification  (%)  |  Ave  Correct  Classification  (%) 


i’igure  V-10.  Performance  Comparison:  Simulated  Signals  (11  features). 


before,  this  poor  classification  performance  by  the  perceptron  is  attributed  to  the  neural 
network’s  inability  to  establish  viable  class  separation. 


SNR 

(dB) 

11 

Features 

26 

Features 

51 

Features 

3.6 

3.4 

0.7 

20 

37 

11.1 

8.5 

15 

23.8 

7.3 

MEM 

10 

4.2 

9.2 

mm 

5 

9.9 

6.0 

7.5 

0 

6.7 

15.3 

14.7 

-5 

15.0 

16.8 

10.7 

-10 

9.6 

15.0 

14.4 

-15 

17.5 

8.8 

6.9 

-20 

10.6 

8.9 

10.9 

Table  V-3.  Observed  Percentage  of  Perceptron  Non-Type  Classification. 

In  addition,  these  figures  further  substantiate  the  insufficiency  of  MSNN  Mod  1. 
All  plots  show  poorer  performance  for  this  MSNN  variant  as  compared  to  the  other 
MSNN  techniques,  with  this  degraded  classification  being  attributed  to  the  inherent 
similarity  in  the  2-ASK  and  2-PSK  signal  descriptions  and  greater  feature  space  data 
overlap  resulting  from  input  normalization. 

With  regards  to  the  remaining  MSNN  variants,  the  outcome  from  trials  conducted 
with  noise-corrupted  signals  failed  to  conclusively  identify  which  was  more  accurate. 
The  simulation  results  were  nearly  identical.  This,  however,  does  not  suggest  a 
conceptual  flaw  in  MSNN  Mods  2  and  3,  but  rather  indicates  inadequate  training.  As 
before,  network  training  for  these  modified  techniques  stopped  on  maximum  epoch  limit 
rather  than  satisfied  VMR.  Therefore,  the  networks  were  not  effectively  trained  to 
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classify  follow-on  observation.  Once  more,  increasing  the  epoch  limit,  refining  the 
learning  rate  methodology,  and  softening  of  the  VMR  threshold  may  provided  for  MSNN 
performance  distinction. 

As  final  evidence  of  classifier  performance,  MSNN  neuron  maps  for  the  SNR  and 
feature  space  conditions  of  Figures  V-4  through  V-9  are  provided.  Shown  as  Figures  V- 
13  through  V-20,  these  plots  support  the  findings  just  described.  Of  particular  interest, 
Figures  V-14  and  V-18  demonstrate  the  inadequacy  of  MSNN  Mod  1  by  the  non¬ 
uniformity  of  the  neuron  maps.  In  addition,  the  neuron  maps  for  the  remaining  MSNN 
variants  illustrate  the  similarity  in  2-ASK  and  2-PSK  specifiers  that  resulted  in  equivalent 
performance  plots. 
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Neuron  Map  [2,3]  ,  Neuron  Map  [1,3]  .  Neuron  Map  [1,2] 


Neuron  Map  [2,3]  ,  Neuron  Map  [1,3]  ,  Neuron  Map  [1,2] 
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Figure  V-15.  MSNN  Mod  2  Neuron  Map  of  11-Features  Simulated  Signal  Data 
(SNR  =  20  dB). 
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Neuron  Map  [2,3]  ,  Neuron  Map  [1,3]  Neuron  Map  [1,2] 
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Figure  V-17.  MSNN  Neuron  Map  of  11-Features  Simulated  Signal  Data  (SNR  =  10  dB). 
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Figure  V-18.  MSNN  Mod  1  Neuron  Map  of  11-Features  Simulated  Signal  Data 
(SNR  =  10  dB). 


Figure  V-19.  MSNN  Mod  2  Neuron  Map  of  11-Features  Simulated  Signal  Data 
(SNR  =  10  dB). 
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Figure  V-20.  MSNN  Mod  3  Neuron  Map  of  11-Features  Simulated  Signal  Data 
(SNR  =  10  dB). 
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D.  SUMMARY 


Chapter  V  investigated  the  classification  of  software-generated  communication 
signals  in  varying  levels  of  noise.  For  the  six  classification  methods  discussed  in  this 
study,  100  testing  and  1000  training  realizations  of  2- ASK,  2-PSK,  and  2-FSK  signals 
were  created  by  encoding  random  binary  messages.  The  experimental  protocol  followed 
the  one  used  in  Chapter  IV.  The  quadratic  classifier  catalogued  ten  sets  of  data,  while  the 
neural  networks  processed  only  five.  The  neural  networks,  however,  processed  each  data 
set  five  times  from  different  initial  conditions. 

Figures  V-10  through  V-12  indicate  that  all  trials  were  inaccurate  (i.e.,  less  than 
ninety-percent  correct  classification  success).  This  observation,  however,  is  not  due  to 
the  classifiers  themselves,  but  to  the  feature  space  definition.  A  more  prudent  selection 
would  have  included  parameters  that  more  distinctly  differentiated  the  2-ASK  and  2-PSK 
signals.  This  not  being  the  case,  the  simulation  results  showed  a  high  degree  of 
misclassification  between  these  two  modulation  types. 

Yet,  the  primary  emphasis  of  this  investigation  was  not  to  accurately  categorize 
observations,  but  to  compare  classifier  capabilities.  For  instance,  analyzing  noise-free 
signal  data  proved  that  the  standard  MSNN  algorithm  performed  best.  Furthermore, 
when  considering  noise-corrupted  data,  none  of  the  proposed  MSNN  schemes  showed 
substantial  improvement  over  the  standard  approach.  In  particular,  MSNN  Mod  1 
delivered  inferior  results  due  to  the  aforementioned  feature  description  similarity  in  the  2- 
ASK  and  2-PSK  signals  and  increased  data  overlap  caused  by  signal  space  normalization. 
The  remaining  MSNN  methods  produced  outcomes  comparable  to  the  original  MSNN 
formulation.  Hence,  no  noteworthy  advantage  was  realized  by  the  proposed  changes  to 
the  standard  MSNN  algorithm. 

The  MSNN  techniques  did  fair  markedly  better  than  the  perceptron  neural 
network.  Without  a  priori  knowledge  of  the  data  set  or  optimal  selection  of  signal 
features,  the  mean  separators  also  performed  better  than  the  statistical  classifier.  Granted, 
when  the  input  data  was  conditioned  by  feature  space  enhancing  techniques  such  as  the 
eigenvalue  methods  used  here,  dramatic  gains  in  quadratic  classifier  performance  were 
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realized.  But,  for  the  principal  component  reduction  utilized,  this  improved  outcome  did 
not  exceed  the  mean  separator  results,  substantiating  the  greater  utility  of  neural 
networks,  in  general,  and  the  MSNN,  in  particular. 
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VI.  CONCLUSIONS 


A.  SUMMARY  OF  WORK 

The  age  of  enhanced  digital  data  collection  and  distribution  requires  electronic 
information  management  techniques  that  will  assist  and  not  hinder  the  warfighter.  These 
applications  must  be  rapid,  reliable,  and  automated.  This  thesis  investigated  the 
continued  development  of  one  such  tool. 

The  Mean  Separator  Neural  Network  (MSNN)  had  previously  been  applied  to  the 
classification  of  underwater  signals.  This  study  modified  the  MSNN  and  evaluated  the 
performance  of  these  variants  in  categorizing  software  simulated  signals.  Starting  with  a 
general  introduction  to  neural  networks,  classification  techniques  were  introduced  and 
explained.  In  addition  to  the  original  MSNN  developed  by  Duzenli  and  Fargues,  two 
non-MSNN  schemes  were  utilized  as  benchmarks  to  gauge  proposed  methods.  The  first 
considered  was  a  pure  parametric  statistical  classifier;  specifically,  a  quadratic  classifier. 
The  decision  rule  for  this  statistical  classifier  was  derived  for  later  use: 

The  second  benchmark  implemented  was  a  single  layer  perceptron  neural 
network.  The  underlying  concept  of  the  perceptron  was  explained  and  its  fundamental 
processing  element  constructed.  In  particular,  the  decision  rule  for  perceptron  neuro¬ 
classification  was  presented.  To  classify  using  the  perceptron,  however,  first  required 
training  the  network  to  discriminate  the  different  class  types.  Hence,  the  perceptron 
learning  rule  and  its  role  in  network  training  was  discussed.  Finally,  the  disadvantages  of 
the  perceptron  networks  were  identified  as  limitations  due  to  the  use  of  linear  decision 
boundaries  and  the  lack  of  solution  optimization  techniques.  As  an  addendum,  the  Fixed- 
Increment  Theorem  of  perceptrons  was  developed  for  edification.  This  precept  specifies 
that  for  certain  problem  types,  the  perceptron  neural  networks  will  converge  to  a  solution 
in  a  finite  number  of  steps. 

The  central  emphasis  of  this  proof  of  concept  study  was  enhanced  implementation 
of  the  MSNN.  But,  to  better  understand  these  improvements,  the  standard  MSNN 
classification  scheme  was  first  explained.  The  goal  of  the  MSNN  is  to  maximize  the 
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mean  separation  of  data  projected  into  a  decision  space.  The  mathematical  method  for 
achieving  this  objective  was  presented  as  a  basis  for  understanding  the  design  of  the 
MSNN  neural  processing  element.  Then,  using  this  fundamental  building  block,  the 
study  next  examined  the  three  stages  of  solving  a  classification  problem  with  the  MSNN: 
training,  typing,  and  decision-making. 

Network  training  was  accomplished  using  a  steepest  descent  algorithm  in  which 
the  training  trajectory  was  governed  by  the  mean-difference  projection  index,  MD.  This 
training  algorithm  also  employed  a  dynamic  learning  rate  rule  to  control  the  training 
trajectory. 

After  training  the  network,  typing  was  completed  by  using  the  mean  separator 
equation  to  assign  a  unique  numerical  sequence  to  each  class.  In  the  decision-making 
stage,  these  class  specifiers  are  then  compared  to  the  network  output  of  subsequent  data 
to  identify  the  uncategorized  observations. 

By  merely  focusing  on  maximizing  mean  separation,  however,  the  MSNN  fails  to 
recognize  the  impact  of  data  variance.  Indeed,  wide  mean  separation  may  be 
inconsequential  if  data  spread  is  equally  large.  Conversely,  a  small  difference  in 
projection  space  means  could  be  acceptable  for  tightly  clustered  data.  Because  of  this, 
three  modifications  to  the  MSNN  algorithm  were  proposed  and  evaluated. 

The  first  MSNN  variant  (MSNN  Mod  1)  suggested  that  MSNN  performance  may 
be  improved  by  pre-processing  the  input  data.  By  normalizing  the  data  about  its  mean, 
we  endeavored  to  tighten  the  input  data  distribution  and  reduce  data  overlap  in  the  feature 
space.  Mapping  these  distributions  into  the  decision  space  would  then  result  in  greater 
precision  to  the  optimal  values;  thus,  less  intersection  of  the  decision  space  distributions 
and  greater  classification  accuracy.  Unfortunately,  it  was  recognized  that  this  may  not  be 
the  case.  Input  data  normalization  may  increase  input  data  diffusion  and  transformation 
into  the  decision  space  may  not  preserve  cluster  cohesion. 

The  second  MSNN  variant  (MSNN  Mod  2)  sought  to  improve  mean  separator 
performance  by  normalizing  the  projection  space  instead  of  the  input  space.  Essentially, 
the  concept  entailed  maximizing  projection  data  mean  spread  relative  to  projection  data 
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variance.  Doing  this  provides  for  thorough  evaluation  of  the  projection  data  distributions. 
Because  of  this,  a  large  mean  separation  may  or  may  not  be  beneficial  dependent  on  how 
accurately  the  input  data  was  mapped  into  the  decision  space.  That  is,  data  projection 
resulting  in  a  large  mean  difference  may  be  meaningless  if  the  projected  data  variance 
was  also  significantly  large.  Conversely,  small  mean  separation  could  be  tolerable  for 
instances  of  small  data  variance. 

Implementation  of  this  model,  however,  was  not  as  straightforward  as  that  of  the 
pre-conditioned  input  variant.  Whereas  the  pre-conditioned  input  data  method  only 
required  normalizing  the  feature  space  and  adjusting  the  decision  scheme,  accounting  for 
projection  space  variance  necessitated  deriving  a  new  performance  index  (MD2)  and 
training  termination  parameter  (VMR). 

The  third  MSNN  variant  (MSNN  Mod  3)  investigated  utilized  the  projection 
index  of  the  standard  MSNN  algorithm  coupled  with  the  new  training  termination 
parameter  developed  for  the  normalized  projection  space  method. 

Utilizing  these  six  classification  methods,  two  types  of  trials  were  conducted.  In 
the  first,  random  vectors  composed  of  simulated  feature  components  were  generated. 
Classifier  performance,  reported  as  a  percentage  value,  was  measured  as  the  accuracy 
obtained  in  properly  categorizing  test  data  of  known  class  type.  In  general,  increased 
SNR  and  feature  space  dimensionality  produced  improved  classifier  performance  for  all 
techniques.  Of  the  benchmarks  used,  the  statistical  classifier  had  the  best  classification 
results;  the  perceptron,  the  worse  for  all  but  the  largest  feature  space  trials. 

The  MSNN  variants  produced  inconclusive  results.  MSNN  Mod  1  performed 
markedly  worse  with  a  small  feature  space  size.  But  as  feature  space  dimensionality 
became  larger,  input  data  preconditioning  delivered  significantly  better  results.  The 
classification  performance  of  MSNN  Mod  2  equaled  that  of  the  standard  MSNN 
approach.  This  lack  of  significant  improvement  was  predominately  due  to  the  MSNN 
Mod  2  networks  not  being  adequately  trained.  Network  training  often  terminated  on 
maximum  epoch  cycles  rather  than  VMR  threshold.  This  same  reason  also  partly 
explains  the  classification  performance  of  MSNN  Mod  3. 
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Having  gained  a  rudimentary  understanding  of  each  classifier’s  capabilities,  a 
second  set  of  trials  tested  their  performance  with  software  simulated  communication 
signals.  Specifically,  three  types  of  binary  modulation  schemes  were  implemented:  2- 
ASK,  2-PSK,  and  2-FSK. 

As  before,  the  perceptron  had  the  worse  classification  results.  The  statistical 
classifier,  however,  did  not  demonstrate  the  best  performance.  In  fact,  unlike  the  other 
techniques,  the  quadratic  classifier  showed  lower  accuracy  with  increased  feature  space 
size.  This  tendency  was  due  to  a  correlation  between  feature  space  components,  resulting 
in  an  ill-conditioned  covariance  matrix.  Extracting  the  principal  components  to  reduce 
the  input  dimensionality  dramatically  improved  statistical  classifier  performance. 

Examining  the  outcome  of  no-noise  trials,  the  standard  MSNN  methodology 
outperformed  all  other  classifiers.  Moreover,  when  considering  noise-corrupted  signals, 
simulation  results  were,  as  in  Chapter  IV,  irresolute.  MSNN  Mod  1  did  consistently 
present  the  worse  results,  presumably  due  to  the  similarity  in  2-ASK  and  2-PSK  feature 
components  and  increased  signal  space  data  diffusion  caused  by  normalization.  All  other 
methods  were  essentially  equivalent.  The  lack  of  improvement  from  MSNN  Mods  2  and 
3  was  ascribed  to  inadequate  network  training. 

B.  SUGGESTIONS  FOR  FUTURE  RESEARCH 

The  intent  of  this  thesis  was  to  propose  and  validate  modifications  to  the  MSNN 
classifier.  Three  such  modifications  were  presented.  When  considering  noise-corrupted 
signals,  none  showed  significant  improvement  over  the  standard  MSNN  approach.  In 
particular,  MSNN  Mod  2,  which  emphasized  projection  data  variance  in  addition  to  mean 
separation,  only  performed  as  well  as  the  standard  MSNN  algorithm.  This  lack  of  proof 
of  concept,  however,  is  not  due  to  discrepancies  in  the  underlying  fundamentals  of  the 
approach,  but  rather  to  method  implementation.  In  particular,  two  aspects  deserve  further 
consideration. 

One  likely  cause  of  inadequate  network  training  using  the  MSNN  Mod  2  variant 
may  be  due  to  reaching  the  maximum  epoch  limit  prior  to  satisfying  the  VMR  threshold. 
Therefore,  to  improve  the  performance  of  the  MSNN  projection  space  normalization 
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scheme,  the  maximum  epoch  setpoint  and  learning  rate  rules  require  thorough 
investigation.  With  regards  to  the  latter,  instead  of  using  an  adaptive  learning  rate 
approach,  starting  with  a  static  learning  rate  (i.e.,  one  that  is  only  dependent  on  the 
gradient  of  the  performance  parameter)  may  provide  better  results  when  compared  to  the 
standard  mean  separator. 

In  addition,  it  may  also  be  instructive  to  reduce  the  stringency  of  the  VMR 
threshold.  As  used  in  this  study,  a  VMR  value  of  zero  equates  to  0.5%  class  overlap, 
assuming  normally  distributed  data.  Furthermore,  the  termination  requirement  sets  the 
VMR  threshold  at  0.90.  This  combination  of  overlap  and  ratio  may  be  unnecessarily 
restrictive.  Therefore,  studies  could  be  conducted  to  empirically  establish  justifiably 
values. 

The  termination  requirement  for  MSNN  Mod  2  should  also  be  re-evaluated. 
VMR  was  used  as  a  training  terminator  only  if  the  projection  index  (MD2  for  MSNN 
Mod  2  and  MD  for  MSNN  Mod  3)  showed  training  movement  towards  an  improved 
solution.  For  MSNN  Mod  2  this  would  have  become  apparent  in  the  VMR  value  itself. 
Therefore,  the  requirement  to  show  decreasing  performance  parameter  values  is 
unnecessary.  For  MSNN  Mod  3,  the  performance  parameter  only  takes  into  account 
projected  data  mean  separation.  By  neglecting  to  consider  data  variance,  the  underlying 
principle  of  VMR  is  disregarded  since  improved  conditions  could  result  when  mean 
separation  decreases  (provided  the  relative  decrease  in  data  variance  is  greater). 

Because  of  this  inadequacy  of  MSNN  Mod  3,  it  may  have  been  more  beneficial  to 
use  VMR  as  the  performance  parameter  instead  of  either  of  the  two  mean-difference 
equations.  This  performance  parameter  would  essentially  be  the  reciprocal  of  MD2.  As 
such,  the  difficulties  encountered  due  to  the  infinitesimally  small  projection  variances 
(i.e.,  division  by  zero)  would  be  avoided. 

Once  an  optimal  mean  separator  algorithm  has  been  determined,  the  modified 
MSNN  classifier  could  be  used  to  identify  real-world  signals  (e.g.,  radar,  communication, 
acoustic).  This  would,  however,  require  a  high  degree  of  classifier  accuracy.  Recall  that 
the  intent  of  this  investigation  was  to  compare  proposed  alterations  to  the  MSNN 
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algorithm.  As  such,  absolute  classifier  accuracy  was  not  the  aim;  rather  relative  classifier 
accuracy  was.  If  a  high  degree  of  absolute  classifier  accuracy  is  desired  (such  as  for 
categorizing  real-world  signals),  judicious  feature  extraction  schemes  and  pre-processing 
techniques  are  needed.  When  proved  successful,  the  modified  MSNN  classifier  utilizing 
this  refined  feature  selection  approach  can  then  be  expanded  from  a  software  application 
to  direct  implementation  on  an  integrated  circuit.  Having  such  a  device  would  greatly  aid 
the  operational  commander  in  understanding  the  battlespace  and  making  critical 
decisions. 
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APPENDIX  A.FIXED-IN CREMENT  CONVERGENCE  THEOREM 


Rosenblatt  reasoned  that  for  a  single-layer  perceptron  applied  to  linear  separable 
problems,  a  solution  can  be  determined  in  a  finite  number  of  iterations.  Stated  formally, 
this  fixed-increment  convergence  theorem  asserts: 

Let  the  subsets  of  training  vectors  Xj  and  X2  be  linearly  separable.  Let  the 
inputs  presented  to  the  single-layer  perceptron  originate  from  these  two 
subsets.  The  perceptron  converges  after  some  no  iterations,  in  the  sense 
that 

w(n0)  =  w(n0+  1)  =  w(n0  +2)  = . . . 

is  a  solution  vector  for  no  ^  n^.  (Haykin,  1994,  p.  1 1 1). 

To  prove  this  theorem,  the  following  vector  notation  is  used  for  convenience: 


x  = 


w 

b 


and  z  = 


(A.1) 


Using  this  notation,  the  input  to  the  hardlim  activation  function,  n,  can  be  expressed  as 

n  =  w«p  +  b  =  xT.z.  ( A.2) 

Similarly,  the  perceptron  learning  rule  Equations  3.11  and  3.12  can  be  combined  into  the 
single  vector  equation 


xneW=xold+ez  (A3) 

Given  a  solution  x*  to  the  classification  problem, 

>  8  >  0  if  t  =  l 

n  =  x*T  •  z  \  (A.4) 

<  -5  <  0  if  t  =  0 

Equation  A.4  implies  that  there  exists  a  positive  8  less  than  the  magnitude  of  the  inner 
product  n  for  both  target  output  possibilities. 

After  k  training  iterations,  the  perceptron  learning  rule  (Equation  A.3)  results  in 
an  updated  solution  be  given  by 

x(k)  =  z’(k  - 1)  +  z’(k  -  2)  + ...  +  z’(0),  (A.5) 
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where  the  prime  (')  accounts  for  the  possible  error  values  0  and  ±1  and  it  is  assumed  that 
the  w(0)  =  0.  Taking  the  inner  product  of  the  solution  vector  x*  with  Equation  A.5  yields 


x*T .  x(k)  =  x*T .  z’(k  - 1)  +  x*T  .  z’(k  -  2)  + ...  +  x*T .  z’(0)  (A.6) 

and  using  the  inequality  relationships  of  Equation  A.4  in  Equation  A.6  leads  to 


x*T  •  x  >  k5,  (A.7) 

with  8  chosen  as  the  minimum  z’(i).  With  the  Cauchy-Schwartz  inequality,  a  lower 
bound  on  the  square  of  the  weight  vector  x(k)  is  therefore  found  to  be 


(x*7  «x(k))2 

II  *l|2 


(A.8) 


To  find  the  upper  bound  for  the  square  of  the  weight  vector  at  iteration  k. 
Equation  A. 3  is  substituted  into  the  length  equation: 


||x(k)||2  =x*T(k).x(k)=[x(k-l)  +  z’(k-l)f  .[x(k-l)  +  z’(k-l)]  (A.9) 

=  ||x(k  -  l)f  +  ||z’(k  -  l)f  +  2xT (k  - l)z’(k  - 1) 

When  proper  classification  occurs,  the  cross-term  in  Equation  A.9  will  be  zero.  If 
misclassification  occurs,  this  term  will  be  negative.  Hence,  Equation  A.9  can  be 
rewritten  as  an  inequality: 


||x(k)f  <||x(k-l)||2  +||z’(k-l)||2.  (A.  10) 

Repeating  this  derivation  for  all  previous  iterations  of  ||  x(i)  || 2,  the  upper  bound  on  the 
square  of  the  weight  vector  is  found  to  be 

||x(kf  <||z-(0f  +||z-(lf  +-  +  Iz’(k-l)f  <kA  (A.  11) 

where  A  is  the  maximum  z’(i). 

Finally,  combining  Equations  A.8  and  A.  11  results  in  a  closed  form  solution  for 
the  number  of  iterations,  k,  required  for  perceptron  convergence: 
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(A.  12) 


||x(k)f<kA 


The  assumptions  made  to  arrive  at  this  conclusion  were  that  (1)  a  solution  is  known  to 
exist  and  (2)  the  length  of  the  input  vectors  is  upper-bounded.  (Hagan,  et  al,  1996,  pp.  4- 


15-4-18). 
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APPENDIX  B.  SIMULATION  RESULTS 

To  determine  the  capabilities  of  the  classifiers  studied,  two  types  of  simulations 
were  conducted.  The  first  set  of  tests  gauged  the  performance  of  the  different  classifiers 
by  creating  artificial  features  for  different  class  types.  Once  provided  with  this  initial 
assessment  of  the  different  classification  schemes,  the  second  simulation  measured  their 
ability  to  categorize  simulated  communication  signals.  Appendix  B  contains  the  results 
from  both  types  of  trials. 

Simulation  results  are  presented  in  two  forms.  On  Tables  B-l  through  B-36, 
confusion  matrices  report  classifier  performance.  A  confusion  matrix  is  an  m  x  m  matrix, 
m  being  the  number  of  categories  in  the  classification  problem.  Read  horizontally,  each 
confusion  matrix  lists  the  correct  class  type;  vertically,  the  class  type  selected  by  the 
classifier.  The  elements  within  each  matrix  indicate  the  percentage  of  objects  (i.e., 
testing  input  data  vectors)  categorized  as  a  certain  class.  In  particular,  the  diagonal 
elements  give  the  percentage  of  correct  classifications  for  a  particular  simulation 
situation.  Averaging  these  diagonal  elements  results  in  a  performance  index  for  that 
particular  classifier  under  the  specified  conditions.  Disregarding  slight  deviation  due  to 
round-off  error,  each  table  row  sums  to  100%  for  all  classifiers  except  the  perceptron 
neural  network.  The  confusion  matrices  for  the  perceptron  neural  networks  do  not  show 
rows  that  sum  to  100%  due  to  non-class  typings  as  reported  on  Tables  IV-1  and  V-2. 

Tables  B-l  through  B-18  report  results  for  the  first  set  of  simulations  conducted; 
classification  of  data  objects  consisting  of  artificial  features.  Tables  B-19  through  B-36 
report  results  for  the  set  of  simulations  conducted  on  simulated  communication  signals. 
On  these  latter  tables,  Jti,  7t2,  and  7t3  correspond  to  simulated  2- ASK,  2-PSK,  and  2-FSK 
class  of  software  created  signals. 

Plots  of  the  average  performance  indices  permit  visual  analysis  of  the  effect  of 
varying  noise  level  and  input  space  dimensionality.  These  graphs  are  provided  as  Figures 
B-l  through  B-6.  Chapters  IV  and  V  contain  performance  index  graphs  that  allow  direct 
comparison  of  the  different  classification  methods. 
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For  all  tables  and  plots,  MSNN  Mod  1  refers  to  the  MSNN  variant  with  input 
space  preconditioning;  MSNN  Mod  2,  the  MSNN  variant  with  projection  space 
normalization;  and  MSNN  Mod  3,  the  MSNN  variant  utilizing  the  standard  MSNN 
performance  parameter,  MD,  and  the  new  training  termination  limit,  VMR. 
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0.0 

100 

SNR  =  20  dB 


SNR  =  10  dB 


INDEX: 

100 

J  711 

g  n2 

H 

O 

<  n3 

100 

0.0 

0.0 

0.0 

mm 

0.0 

0.0 

0.0 

100 

INDEX: 

100 

SELECTED 

7i,*  n2*  n3* 

J  711 
< 

P  «2 

H 

U 

<  n3 

100 

0.0 

0.0 

0.0 

100 

0.0 

0.0 

0.0 

100 

SNR  =  5  dB 


1 

SELECTED 

jii*  Jt2*  n3* 

J  711 

^  n2 

H 

U 

<  7t3 

100 

0.0 

0.0 

0.0 

100 

0.0 

0.0 

0.0 

100 

SNR  =  15  dB 

I  INDEX: 

1  100 

SELECTED 

71,*  7l2*  Jt3* 

L  * 

3  Jt2 

H 

U 

◄  7t3 

100 

0.0 

0.0 

0.0 

100 

0.0 

0.0 

0.0 

100 

INDEX: 

99.4 

SELECTED 

Tt]*  7l2*  7t3* 

J  711 

99.6 

0.1 

0.2 

p  tt2 

H 

r  ) 

0.6 

99.0 

0.4 

W 

◄  n3 

0.2 

0.2 

99.6 

SNR  =  0  dB 


SNR  =  -5  dB 


I  INDEX: 

1  93.8 

SELECTED 

Til*  7I2*  Jl3* 

J  711 

i  712 

u 

<  n3 

94.4 

2.8 

2.8 

4.3 

93.3 

2.4 

4.0 

2.2 

93.7 

INDEX: 

96.9 

SELECTED 

JIl*  7t2*  7t3* 

J  711 

O  Jt2 

H 

O 

<  n3 

96.5 

1.4 

2.1 

1.3 

97.5 

1.2 

2.1 

1.0 

96.8 

SNR  =  -10dB  SNR  =  -15  dB 


INDEX: 

96.4 

SELECTED 

71]  *  7I2*  Jt3* 

J  711 

< 

O 

<  % 

98.0 

1.2 

0.8 

1.7 

96.2 

2.1 

2.5 

2.5 

95.0 

SNR  =  -20  dB 


Table  B-3.  Confusion  Matrices  for  Simulated  Feature  Trials  (Three-Class,  Fifty-Features): 
Statistical  Classifier,  (see  App  B  cover  page  for  table  description) 
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INDEX: 

99.9 

SELECTED 

7Tj*  n2* 

7I3* 

P  711 

99.9 

0.0 

0.1 

P  *2 

H 

U 

<  n3 

0.0 

100 

0.0 

0.0 

0.1 

99.9 

SNR  =  20  dB 


INDEX: 

99.8 

SELECTED 

Til*  7t2*  n3* 

J  711 

99.9 

0.0 

0.1 

P  7t2 

r  ^ 

0.0 

99.7 

0.1 

<  n3 

0.0 

0.0 

99.9 

SNR  =  15  dB 


INDEX: 

99.6 

SELECTED 

JEj*  7t2*  JI3* 

j  ”■ 

1 

U 

<  n3 

99.8 

0.0 

0.1 

0.0 

99.7 

0.2 

0.5 

0.1 

99.4 

SNR  =  10  dB 


INDEX: 

96.7 

SELECTED 

jii*  n2*  it3* 

j  ”■ 

|  m 

u 

<  Jl3 

96.1 

0.2 

2.4 

0.3 

95.5 

2.7 

0.8 

0.7 

98.5 

SNR  =  5  dB 


INDEX: 

52.7 

SELECTED 

7ti*  7C2*  7E3* 

•J  711 

i  "2 

u 

•<  % 

45.5 

6.6 

36.2 

7.3 

46.9 

32.1 

15.1 

15.0 

65.6 

SNR  =  -5  dB 


INDEX: 

32.1 

SELECTED 

jt!*  n2*  n3* 

J  711 

is  n2 

H 

u 

<  n3 

21.9 

10.8 

49.3 

14.4 

21.1 

48.1 

15.7 

15.0 

53.3 

SNR  =  -10  dB 


SNR  =  -15  dB 


INDEX: 

28.3 

SELECTED 

7Ci*  JI2*  Jt3* 

J  711 

g  n2 

H 

U 

<  n3 

16.5 

14.0 

50.6 

12.4 

15.4 

51.8 

14.7 

14.5 

52.9 

SNR  =  -20  dB 


Table  B-5.  Confusion  Matrices  for  Simulated  Feature  Trials  (Three-Class,  Ten-Features): 
Perceptron.  (see  App  B  cover  page  for  table  description) 


Table  B-6.  Confusion  Matrices  for  Simulated  Feature  Trials  (Three-Class,  Fifty-Features): 
Perceptron.  (see  App  B  cover  page  for  table  description) 
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INDEX: 

99.6 

SELECTED 

7t,*  n2*  713* 

J  711 

g  n2 

H 

U 

<  % 

99.8 

0.2 

0.6 

99.3 

0.2 

0.2 

0.0 

99.7 

SNR  =  20  dB 


INDEX: 

1  SELECTED  | 

95.0 

71,* 

7t2* 

7I3* 

J  711 

89.5 

1.2 

9.3 

<3 

P  7l2 

H 

0.7 

98.8 

0.5 

<  7t3 

3.2 

0.2 

96.5 

SNR  =  10  dB 


INDEX: 

96.8 

SELECTEE 

7Ci*  7^* 

ji3*  | 

J  711 

98.3 

0.0 

1.6 

g  7l2 

r  \ 

0.0 

96.0 

4.0 

U 

<1  7t3 

0.6 

3.2 

96.2 

SNR  =  15  dB 


INDEX: 

89.0 

SELECTED 

Tti*  ■  7I2*  7l3* 

J  711 

86.8 

5.6 

7.6 

g  7l2 

H 

r  \ 

1.0 

96.4 

2.7 

u 

■<  7I3 

6.8 

9.4 

83.7 

SNR  =  5  dB 


INDEX: 

65.3 

SELECTED 

Tti*  7t2*  7l3* 

J  711 

g  ih 

H 

a 

<  n3 

58.0 

24.0 

18.1 

28.1 

64.1 

7.8 

16.3 

9.7 

74.0 

SNR  =  0dB 


INDEX: 

48.2 

SELECTED 

71,*  7t2*  7t3* 

J  711 

< 

g  ** 

U 

<  7t3 

54.7 

22.8 

22.6 

25.4 

50.8 

23.8 

27.4 

33.6 

39.0 

SNR  =  -10  dB 


INDEX: 

37.8 

SELECTED 

JI,*  7t2*  7l3* 

_>  711 

E3  7l2 

H 

O 

◄  % 

32.2 

37.3 

30.6 

33.5 

45.3 

21.1 

31.3 

32.8 

35.9 

SNR  =  -20  dB 


INDEX: 

54.5 

SELECTED 

71,*  7t2*  7l3* 

j  *■ 

1 

U 

<  It} 

51.4 

21.6 

27.0 

22.4 

56.4 

21.2 

18.5 

25.7 

55.9 

SNR  =  -5  dB 


INDEX: 

45.0 

SELECTED 

7t,*  7I2*  7I3* 

ACTUAL 

3?  3  i? 

54.6 

17.3 

28.0 

36.1 

24.1 

39.9 

31.5 

12.3 

56.3 

SNR  =  -15  dB 


Table  B-7.  Confusion  Matrices  for  Simulated  Feature  Trials  (Three-Class,  Three-Features): 


MSNN.  (see  App  B  cover  page  for  table  description) 


115 


1  INDEX: 

|  99.8 

Ttj* 

SELECTEE 

7t2* 

1 

7l3*  1 

*■ 

i  * 

<  7t3 

0.0 

0.3  1 

0.0 

100 

0.0  | 

0.1 

0.0 

99.9  1 

INDEX: 

53.6 


SNR  =  20  dB 


SNR  =  10  dB 


SELECTED 


SNR  =  0  dB 


SELECTED 


SELECTED 


0.0 


99.8 


0.0 


SNR  =  15  dB 


INDEX: 

99.4 

71]* 

SELECTEE 

712* 

7t3* 

j  *■ 

2  m 

V 

<  7t3 

99.5 

0.3 

0.3 

0.6 

99.3 

0.1 

0.2 

0.4 

99.4 

98.3 


0.9 


SNR  =  5  dB 


Tli* 

7t2* 

7l3* 

J  711 

3.5 

2.7 

U  712 

u 

<  n3 

4.0 

89.9 

6.1 

6.1 

6.3 

87.5 

SELECTED 

7li* 712* 


22.6 


70.9 


12.4 


SNR  =  -5  dB 


28.2 

25.8 

56.7 

22.1 

23.4 

58.2 

SELECTED 

712*  7t3* 


29.1  26.0 


Table  B-8.  Confusion  Matrices  for  Simulated  Feature  Trials  (Three-Class,  Ten-Features): 
MSNN.  (see  App  B  cover  page  for  table  description) 


INDEX: 

100 


SELECTED 

7t  2* 


SNR  =  10  dB 


SNR  =  0  dB 


J  711 

100 

0.0 

0.0 

g  7t2 

H 

0.0 

100 

0.0 

u 

<  n3 

0.0 

0.0 

100 

SNR  =  20  dB 

INDEX: 

99.8 

71]  * 

SELECTED 

n2* 

7C3* 

J  711 

99.5 

0.3 

0.2 

5  n2 

H 

0.0 

100 

0.0 

u 

<  7t3 

0.0 

0.0 

100 

INDEX: 

99.9 


INDEX: 

99.5 


INDEX: 

98.8 

Tli* 

SELECTED 

7t2* 

7I3* 

J  711 

99.1 

0.6 

0.3  . 

g  * 

T  \ 

1.0 

98.3 

0.7 

w 

<  n3 

0.4 

0.4 

99.1 

INDEX: 

95.3 


INDEX: 

75.3 

Tli* 

SELECTED 

7t2* 

j 

n3*  1 

J  711 

76.7 

10.0 

13.3 

g  712 

10.4 

76.5 

13.1 

u 

<  7t3 

14.1 

13.2 

72.7 

S] 

INDEX: 

39.4 

Tli* 

SELECTED 

7t2* 

7I3* 

J  711 

41.6 

30.4 

28.0 

g  7l2 

34.5 

38.8 

26.8 

V 

■<  7I3 

31.9 

30.2 

37.9 

INDEX: 

56.0 


SELECTED 

712* 


0.0 


100 


0.0 


SNR  =  15  dB 


SELECTED 


0.2 


99.6 


0.1 


SNR  =  5  dB 


SELECTED 

7t2* 


2.5 


96.1 


2.2 


SNR  =  -5  dB 


SELECTED 


Table  B-9.  Confusion  Matrices  for  Simulated  Feature  Trials  (Three-Class,  Fifty-Features): 
MSNN.  (see  App  B  cover  page  for  table  description) 
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—  -  1 

INDEX: 

89.4 

71]* 

SELECTED 

7I2*  JI3* 

INDEX: 

83.9 

SELECTED 

Jll*  7t2*  7t3* 

j  *■ 

1 

U 

•<  7T3 

90.4 

8.1 

1.6 

J  711 

|  712 

U 

■<  tc3 

82.9 

1.3 

15.8 

5.6 

92.5 

1.9 

2.2 

77.4 

20.4  | 

7.1 

7.7 

85.2 

4.0 

4.7 

91.3 

SNR  =  20  dB 

SNR  =  15  dB 

1  INDEX: 

1  85.8 

SELECTED 

til*  Jl2*  7t3* 

1 

INDEX: 

80.4 

SELECTED  | 

Ttl*  7t2*  7t3* 

|| 

87.3 

1.0 

11.7 

j  ”■ 

|  ** 

U 

◄  7I3 

83.9 

7.0 

9.2 

4.7 

85.3 

10.1 

8.6 

83.0 

8.4 

14.9 

0.4 

84.7 

14.4 

11.2 

74.4 

SNR  =  10  dB 

SNR  =  5  dB 

INDEX: 

62.6 

SELECTED 

%2*  rc3* 

1 

SELECTED 

7ti*  n?*  it** 

19 

54.8 

18.9 

26.4 

J  n71 

47.2 

25.5 

27.3 

53.1 

14.9 

£  712 

r ) 

17.8 

57.2 

25.0 

14.2 

5.9 

79.9 

w 

<  7C3 

19.5 

28.4 

52.2 

SNR  =  0  dB 

SNR  =  -5  dB 

INDEX:  SELECTED 

42,7  Jtl*  %2*  JI3* 

1 

INDEX: 

41.8 

SELECTED 

7ti*  Jt2*  11** 

I  j  44.9 

- 

30.9 

24.2 

J  711 

44.7 

28.4 

27.0 

£  712  29.9 

a 

44.1 

26.0 

g  712 

r ) 

28.2 

37.3 

34.5 

* 111 1 

28.3 

32.7 

39.0 

^  % 

28.1 

43.3 

3 

o 

iH 

1 

II 

S] 

VR  =  -15  dB 

INDEX: 

35.7 

SELECTED 

Til*  7t2*  7l3* 

63.7 

17.8 

18.5 

e  *> 

u 

60.6 

19.5 

20.0 

<  1t} 

58.3 

17.6 

24.0 

SNR  =  -20  dB 

Table  B-10.  Confusion  Matrices  for  Simulated  Feature  Trials  (Three-Class,  Three-Features): 
MSNN  Mod  1.  (see  App  B  cover  page  for  table  description) 


INDEX: 

96.0 


SELECTED 

Jt2* 


2.2 


96.2 


3.0 


SNR  =  20  dB 


SELECTED 

n2* 


0.8 


97.4 


1.0 


SNR  =  10  dB 


SELECTED 

n2* 


2.6 


87.1 


8.4 


SNR  =  0  dB 


INDEX: 

55.7 


SELECTED 

n2* 


30.0 


INDEX: 

96.3 


INDEX: 

94.7 


INDEX: 

47.5 


SELECTED 


1.5 


97.7 


2. 


SNR=  15  dB 


SELECTED 

n2* 


0.7 


92.6 


1.1 


SNR  =  5  dB 


SELECTED 

n2* 


20.8 


68.7 


12.7 


SNR  =  -5  dB 


SELECTED 

Tli*  712*  713* 


24.2 


INDEX: 

43.1 

Tti* 

SELECTED 

n2* 

7l3* 

< 

P  7t2 

U 

<  7t3 

35.7 

29.2 

35.0 

23.4 

45.4 

31.2 

22.6 

29.0 

48.3 

Table  B-ll.  Confusion  Matrices  for  Simulated  Feature  Trials  (Three-Class,  Ten-Features): 
MSNN  Mod  1.  (see  App  B  cover  page  for  table  description) 
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INDEX: 

99.7 


j  711 

^  n2 

H 

U 

<  7t3 


SNR  =  10  dB 


SELECTED 

Ttj* 


0.0 


99.8 


0.1 


SNR  =  0  dB 


J  711 

13  7T2 

H 

■<  n3 

100 

0.0 

0.0 

0.0 

100 

0.0 

0.0 

0.0 

100 

SNR  =  20  dB 

INDEX: 

100 

Til* 

SELECTED 

n2* 

Jt3* 

J  711 

^  712 

H  2 

U 

<  n3 

99.9 

0.0 

0.0 

0.0 

100 

0.0 

0.0 

0.0 

100 

INDEX: 

100 


INDEX: 

100 


INDEX: 

96.6 


INDEX: 

66.2 

Tt,* 

SELECTED 

n2* 

7t3*  1 

J  711 

80.7 

10.1 

9.1 

g  m 

u 

<  7I3 

14.7 

69.5 

15.9  1 

31.1 

20.5 

48.4  | 

INDEX: 

53.4 


SELECTED 

7I2* 


0.0 


100 


0.0 


SNR  =  15  dB 


SELECTED 

n2* 


0.0 


100 


0.0 


SNR  =  5  dB 


SELECTED 

n2* 


2. 


97.5 


1.5 


SNR  =  -5  dB 


SELECTED 

n2* 


Table  B-12.  Confusion  Matrices  for  Simulated  Feature  Trials  (Three-Class,  Fifty-Features): 
MSNN  Mod  1.  (see  App  B  cover  page  for  table  description) 
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INDEX: 

99.5 


INDEX: 

93.2 


INDEX: 

63.2 


SELECTED 

ji2* 


0.1 


99.9 


0.1 


SNR  =  20  dB 


SELECTED 


0.8 


98.7 


1.5 


SNR=  10  dB 


SELECTED 

n2* 


23.2 


59.2 


9.8 


SNR  =  0  dB 


INDEX: 

96.8 


INDEX: 

87.1 


INDEX: 

54.7 


INDEX: 

48.0 

Jli * 

SELECTED 

Jl2* 

7t3* 

J  711 

56.6 

19.3 

24.1 

P  rc2 

H 

V 

<  % 

28.2 

44.8 

27.0 

28.9 

28.7 

42.4 

SI 

INDEX: 

36.5 

7li* 

SELECTEE 

7t2* 

7t3* 

J  711 

32.7 

38.0 

29.3 

g  n2 

V 

< 

32.9 

43.2 

23.9 

31.5 

35.0 

33.6 

INDEX: 

42.2 


SELECTED 


SNR  =  15  dB 


SELECTED 

ib* 


5.3 


95.7 


9.8 


SNR  =  5  dB 


SELECTED 


43.6 

28.4 

16.5 

61.6 

13.7 

27.3 

SNR  =  -5  dB 


SELECTED 


Table  B-13.  Confusion  Matrices  for  Simulated  Feature  Trials  (Three-Class,  Three-Features): 
MSNN  Mod  2.  (see  App  B  cover  page  for  table  description) 
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INDEX: 

99.4 


INDEX: 

90.7 


INDEX: 

53.4 


J  711 

< 

£ 

U 

<  n3 


INDEX: 

35.0 


J  711 
^  n2 

H 

u 

<  n3 


SELECTED 

7t2* 


0.0 


100 


0.0 


SNR  =  20  dB 


SELECTED 

7I2* 


0.4 


99.4 


.3 


SNR  =  10  dB 


SELECTED 

Til* 


SNR  =  0  dB 


SELECTED 

712* 


31.8 


INDEX: 

99.9 


INDEX: 

98.2 


Tti* 

SELECTED 

7t2* 

n3* 

45.7 

28.6 

25.7 

20.9 

56.5 

22.6 

18.4 

58.1 

SELECTED 

7^2* 


0.0 


99.8 


0.0 


SNR  =  15  dB 


SELECTED 

7t2* 


0.9 


98.4 


1.0 


SNR  =  5  dB 


SNR  =  -5  dB 


SELECTED 

7t2* 


30.5 


40.9 


30.4 


SNR  =  -15  dB 


Table  B-14.  Confusion  Matrices  for  Simulated  Feature  Trials  (Three-Class,  Ten-Features):  MSNN 
Mod  2.  (see  App  B  cover  page  for  table  description) 
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INDEX: 

99.0 


SNR  =  10  dB 


SELECTED 

ji2* 


1.1 


99.2 


0.3 


SNR  =  0  dB 


INDEX: 

100 

7l!* 

SELECTEE 

7I2* 

7I3* 

J  * 

100 

0.0 

0.0 

< 

g  7t2 

H 

V 

<  7l3 

0.0 

100 

0.0 

0.0 

0.0 

100 

SNR  =  20  dB 

INDEX: 

99.7 

Jtl* 

SELECTED 

7t2* 

n3* 

99.7 

0.3 

0.0 

g  n2 

H 

0.2 

99.7 

0.0 

u 

<!  7I3 

0.2 

0.0 

99.8 

INDEX: 

99.9 


INDEX: 

99.2 


*■ 

I 

U 

◄  % 


INDEX: 

95.6 


INDEX: 

71.9 

7li* 

SELECTED 

rc2* 

7t3* 

J  711 

73.2 

12.0 

14.8 

<3 

g  Th 

H 

U 

n3 

11.6 

74.2 

14.2 

15.5 

16.2 

68.2 

SI 

INDEX: 

36.3 

Ttl* 

SELECTED 

n2* 

Ji3* 

J  711 

36.9 

35.2 

28.0 

|  n2 

H 

U 

◄  713 

32.9 

39.3 

27.8 

32.8 

34.6 

32.7 

INDEX: 

47.4 


SELECTED 

n2* 


0.0 


100 


0.0 


SNR  =  15  dB 


SELECTED 

7I2* 


0.3 


98.7 


0.4 


SNR  =  5  dB 


SELECTED 

n2* 


1.8 


SNR  =  -5  dB 


SELECTED 

n2* 


27.8 


49.1 


27.2 


=  -15  dB 


Table  B-15.  Confusion  Matrices  for  Simulated  Feature  Trials  (Three-Class,  Fifty-Features): 
MSNN  Mod  2.  (see  App  B  cover  page  for  table  description) 
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Table  B-16.  Confusion  Matrices  for  Simulated  Feature  Trials  (Three-Class,  Three-Features): 
MSNN  Mod  3.  (see  App  B  cover  page  for  table  description) 
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INDEX: 

99.2 

7l\* 

SELECTED 

7I2* 

7t3* 

J  711 

98.1 

0.0 

1.9 

<3 

D  7I2 

0.3 

99.7 

0.0 

U 

<  7t3 

0.1 

0.0 

99.9 

INDEX: 

97.1 


INDEX: 

79.0 


INDEX: 

46.0 


INDEX: 

34.7 


SNR  =  20  dB 


SELECTED 

712* 


2.1 


95.9 


1.2 


SNR  =  10  dB 


SELECTED 

7t2* 


6.7 


79.9 


12.1 


SNR  =  0  dB 


SELECTED 

7t2* 


33.1 


SELECTED 

71 2* 


38.6 


I  * 

U 

<  ns 


INDEX: 

90.5 


INDEX: 

39.2 


SELECTED 

7I2* 


0.0 


98.9 


0.6 


SNR  =  15  dB 


SELECTED 

7I2* 


2.2 


93.1 


5.6 


SNR  =  5  dB 


SELECTED 

712* 


20.5 


57.4 


12.6 


SNR  =  -5  dB 


SELECTED 

7t 2* 


Table  B-17.  Confusion  Matrices  for  Simulated  Feature  Trials  (Three-Class,  Ten-Features): 
MSNN  Mod  3.  (see  App  B  cover  page  for  table  description) 
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mm 

Hi* 

SELECTED 

n2*  rc3* 

j  711 

100 

0.0 

0.0 

e  * 

rj 

0.0 

100 

0.0 

<  n3 

0.0 

0.0 

100 

SNR  =  20  dB 


INDEX: 

98.4 

SELECTED 

n,*  n2*  tc3*  i 

J  711 

^  n2 

H 

U 

■<  7t3 

97.2 

0.7 

2.2 

1.4 

98.1 

0.5 

0.0 

0.1 

99.9 

SNR  =  10  dB 


INDEX: 

91.5 

SELECTED 

1  Jti*  n2*  rc3* 

J  711 

|  ** 

U 

<  Jt3 

89.4 

4.4 

6.3 

4.2 

93.5 

2.3 

4.9 

3.6 

91.5 

SNR  =  0  dB 


SELECTED 

Ttl*  n2*  7l3* 

-i  *■ 

1  * 

O 

-<  7C3 

mm 

18.6 

21.2 

1  22.2 

59.5 

18.3 

25.9 

23.6 

50.5 

SNR  =  -10  dB 


SELECTED 

Til*  Jt2*  Jt3* 

J  711 

99.8 

0.0 

0.2 

g  % 

rj 

3.2 

0.6 

■<  7t3 

0.0 

0.0 

100 

SNR  =  15  dB 


INDEX: 

97.7 

SELECTED  | 

Tti*  Jl2*  7l3*  1 

J  711 

96.9 

2.1 

1.0 

g  % 

O 

<  n2 

1.7 

97.2 

1.1 

0.6 

0.2 

99.1 

SNR  =  5  dB 


INDEX: 

78.4 

SELECTED 

7ti*  n2*  ji3* 

J  711 

79.4 

7.2 

13.4 

H  712 

rj 

9.9 

77.0 

13.1 

C  7l3 

13.0 

8.3 

78.6 

SNR  =  -5  dB 


1  INDEX: 
43.8 

SELECTED 

Til*  %2*  7t3* 

1  J  ^ 

27.3 

28.0 

g  % 

CJ 

30.3 

42.0 

27.7 

<  % 

27.1 

44.6 

SNR  =  -15  dB 


INDEX: 

36.1 

SELECTED 

Jtl*  Jl2*  JI,* 

j  *■ 

1  * 

O 

◄  Jt3 

35.3 

37.0 

27.6 

32.4 

41.4 

26.2 

32.4 

36.1 

31.5 

SNR  =  -20  dB 


Table  B-18.  Confusion  Matrices  for  Simulated  Feature  Trials  (Three-Class,  Fifty-Features): 
MSNN  Mod  3.  (see  App  B  cover  page  for  table  description) 


INDEX: 

75.2 


INDEX: 

67.9 


INDEX: 

44.6 


SELECTED 

7t2* 


38.1 


63.8 


0.0 


SNR  =  20  dB 


SELECTED 

n2* 7X3* 


44.8 


49.8 


0.2 


SNR=  10  dB 


SELECTED 


35.7 


35.2 


18.7 


SNR  =  0  dB 


SNR  =  -20  dB 


INDEX: 

71.3 


INDEX: 

60.2 


INDEX: 

35.9 


INDEX: 

33.3 

7ti* 

SELECTED 

n2* 

7l3* 

J  711 

33.5 

30.0 

36.5 

g 

H 

U 

◄  n3 

34.3 

29.8 

35.8 

33.4 

30.0 

36.6 

R  x  1 

INDEX: 

33.3 

7li* 

SELECTED 

Tk* 

7t3* 

J  711 

37.4 

30.9 

31.7 

g  ik 

H 

U 

-<  713 

36.7 

31.6 

31.7 

38.3 

30.8 

30.9 

INDEX: 

33.6 


INDEX: 

33.3 


J  711 

< 

P  tc2 

H 

CJ 

<  n3 


SELECTED 

7t2* 


42.5 


56.5 


0.0 


SNR  =  15  dB 


SELECTED 

71?*  «a* 


40.6 


42.9 


2.3 


SNR  =  5  dB 


SELECTED 

7I2* 


30.6 


33.2 


28.6 


SNR  =  -5  dB 


SELECTED 


32.7 


32.3 


32.2 


SNR  =  -15  dB 


SELECTED 

7t2* 


0.0 


No  Noise 


Table  B-19.  Confusion  Matrices  for  Simulated  Modulated  Signals  (Three-Class,  Fifty-One  Features): 
Statistical  Classifier,  (see  App  B  cover  page  for  table  description) 
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SNR  =  10  dB 


SNR  =  0  dB 


SNR  =  -10  dB 


INDEX: 

33.3 


SELECTED 


SNR  =  -20  dB 


INDEX: 

79.4 

l  TIj* 

SELECTEE 

7C2* 

7l3* 

|  *2 

CJ 

68.0 

32.0 

0.0 

29.9 

70.1 

0.0 

<  n3 

0.0 

0.0 

100 

SNR  =  20  dB 

INDEX: 

70.9 

71,* 

SELECTED 

7t2* 

n3* 

ACTUAL 

£  S'* 

56.4 

43.6 

0.1 

43.3 

56.5 

0.2 

0.1 

0.2 

99.8 

SELECTED 


I  7t,* 


60.0 


J  1 

I 

U 

•<  7C3 


7ti* 

SELECTEE 

7t2* 

713* 

mm 

42.0 

17.7 

38.7 

43.2 

18.1 

13.8 

18.4 

67.8 

J  711 

|  "2 

u 

<  7t3 


INDEX: 

33.6 

71]* 

SELECTED 

7t2* 

7I3* 

J  711 

36.2 

32.3 

31.5 

g  7l2 

u 

<  n3 

36.4 

30.9 

32.7 

Kill 

32.2 

33.7 

INDEX: 

33.3 


40.0 


64.3 


0.0 


SNR  =  15  dB 


SELECTED 

Th* 


51.2 

45.5 

44.5 

50.6 

1.8 

2.9 

SNR  =  5  dB 


SELECTED 

712* 713* 


34.5 


34.1 


28.0 


SNR  =  -5  dB 


I  71!* 


33.7 


SELECTED 

7t2* 


32.0 


32.1 


32.8  , 


SNR  =  -15  dB 


SELECTED 

7t2* 


|  40.0 

0.0 

|  40.0 

0.0 

1  40-0 

0.0 

No  Noise 


Table  B-20.  Confusion  Matrices  for  Simulated  Modulated  Signals  (Three-Class,  Twenty-Six  Features): 
Statistical  Classifier,  (see  App  B  cover  page  for  table  description) 
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r 

n 

INDEX: 

78.8 

SELECTED 

7ti*  Jt2*  T[3* 

1 

INDEX: 

75.4 

ni* 

SELECTED 

Jl2*  7l3* 

J  *• 

g  n2 

H 

V 

<  n3 

68.0 

32.0 

0.0 

J  711 

u 

<  7I3 

65.5 

34.3 

0.2 

31.7 

68.3 

0.0 

39.2 

60.7 

0.1 

0.0 

0.0 

100 

0.0 

0.0 

100 

S 

NR  =  20  dB 

SNR  =  15  dB 

INDEX: 

73.0 

SELECTED 

,  7Tj  *  n2*  7l3* 

INDEX: 

65.9 

Jti* 

SELECTEE 

7t2* 

7I3* 

J  711 

g  n2 

H 

U 

<  n3 

61.9 

37.5 

0.6 

ACTUAL 

.3  3 

51.4 

45.8 

2.8 

41.7 

57.7 

0.6 

42.7 

53.6 

3.7 

0.4 

0.1 

99.4 

3.0 

4.3 

92.7 

SNR  =  10  dB 

SNR  =  5  dB 

INDEX: 

51.7 

SELECTED 

7ii*  n2*  n3* 

i 

INDEX: 

38.3 

SELECTEE 

71,*  Jl2* 

7l3* 

j  *■ 

1  * 

O 

<  n3 

43.0 

42.6 

14.5 

j 

|  Hi 

CJ 

<  n3 

37.1 

37.0 

26.0 

42.7 

43.5 

13.8 

37.5 

35.9 

26.6 

15.1 

16.2 

68.7 

27.8 

30.3 

41.9 

V 

SNR  =  0  dB 

SNR  =  -5  dB 

INDEX: 

34.7 

SELECTED 

Jll*  7l2*  7l3* 

1 

INDEX: 

33.5 

SELECTED 

71,*  7l2*  Jt3* 

J  711 

i  *2 

u 

<  7l3 

34.2 

31.6 

34.3 

j 

1  ”i 

U 

■<  n3 

31.6 

33.7 

34.7 

33.0 

32.3 

34.7 

32.0 

33.8 

34.2 

31.3 

31.2 

37.5 

30.8 

34.0 

35.2 

SNR  =  -10  dB 

SNR  =  -15  dB 

INDEX: 

33.1 

SELECTED 

Jll*  Jl2*  Jl3* 

INDEX: 

57.0 

SELECTED 

71,*  7l2*  Jt3* 

J  711 

i  712 

H 

O 

<  n3 

31.6 

34.7 

33.7 

J  711 

t  712 

U 

<  n3 

45.7 

mm 

0.0 

31.9 

35.9 

32.2 

54.7 

45.3 

0.0 

32.3 

35.9 

31.7 

0.0 

20.0 

80.0 

S 

NR  =  -20  dB  No  Noise 

J 

Table  B-21.  Confusion  Matrices  for  Simulated  Modulated  Signals  (Three-Class,  Eleven-Features): 


Statistical  Classifier,  (see  App  B  cover  page  for  table  description) 


INDEX: 

78.7 


INDEX: 

63.8 


INDEX: 

30.1 


SELECTED 

7t2* 


13.6 


71.4 


0.6 


SNR  =  20  dB 


SELECTED 

7t2* 


26.4 


47.0 


2.3 


SNR  =  10  dB 


SELECTED 


SNR  =  0  dB 


SELECTED 

Tfe* 


34.2 


34.8 


24.6 


SNR  =  -10  dB 


INDEX: 

71.7 


INDEX: 

50.7 


23.6 

31.3 

25.0 

31.7 

12.2 

72.8 

INDEX: 

33.1 


INDEX: 

31.2 


INDEX: 

29.6 

Ttj* 

SELECTEE 

n2* 

7l3* 

j  ”■ 

S  % 

V 

<  7l3 

15.1 

24.0 

50.1 

15.6 

23.4 

50.0 

15.6 

23.2 

50.4 

INDEX: 

92.9 


SELECTED 

n2* 


16.0 


53.3 


1.2 


SNR  =  15  dB 


3.6 


SNR  =  5  dB 


SELECTED 

n2* 


29.8 


29.9 


25.3 


SNR  =  -5  dB 


SELECTED 


20.3 


20.2 


20.7 


SNR  =  -15  dB 


SNR  =  -20  dB 


SELECTED 

n2* 


5.2 


87.2 


0.0 


No  Noise 


SELECTED 

712* 


Table  B-22.  Confusion  Matrices  for  Simulated  Modulated  Signals  (Three-Class,  Fifty-One  Features) 
Perceptron.  (see  App  B  cover  page  for  table  description) 


INDEX: 

74.2 

SELECTED 

jti*  n 2*  7t3* 

J  711 

3  n2 

H 

U 

<  ti3 

55.4 

19.4 

5.3 

11.7 

69.0 

5.9 

0.1 

1.8 

98.1 

SNR  =  20  dB 


SNR  =  10  dB 


SNR  =  0  dB 


SNR  =  -10  dB 


SNR  =  -20  dB 


INDEX: 


SELECTED 


INDEX: 

61.4 

SELECTED 

71,*  n2*  7t3* 

j  *■ 

|  *! 

o 

<  7t3 

24.0 

43.5 

16.3 

8.1 

66.0 

14.4 

0.4 

5.4 

94.2 

INDEX: 

40.0 

SELECTED 

71]*  7t2*  7t3* 

j 

|  ”! 
o 

<  n3 

28.5 

28.5 

21.3 

26.5 

30.5 

22.4 

14.8 

20.5 

61.1 

INDEX: 

28.6 

SELECTED 

7Ij*  7l2*  7t3* 

J  711 

< 

P  7t2 

H 

U 

<  n3 

18.4 

21.9 

44.1 

18.5 

21.7 

44.5 

18.6 

21.5 

45.7 

INDEX: 

SELECTED 

30.4 

Ttj* 

7t2* 

7l3* 

J  711 

22.8 

27.8 

40.7 

^  512 

H 

23.0 

27.7 

40.3 

s J 

<  7l3 

22.8 

27.7 

40.6 

SNR  =  -5  dB 


No  Noise 


nJ  711 

g  7t2 

H 

U 

<  n3 

62.6 

16.1 

9.8 

29.1  , 

46.4 

14.0 

0.2 

1.1 

98.7 

SNR  =  15  dB 

INDEX: 

51.8 

SELECTED 

Ttj*  n2*  7l3* 

j  ”■ 

|  "! 

o 

<  7t3 

39.3 

20.5 

30.9 

31.1 

26.3 

33.9 

4.1 

5.9 

89.9 

SNR  =  5  dB 

INDEX: 

30.3 

SELECTED 

71]*  7I2*  7t3* 

ACTUAL 

£ 

18.5 

19.1 

43.3 

19.0 

19.2 

43.2 

18.0 

16.2 

53.1 

INDEX: 

30.4 

SELECTED 

7t]*  7t2*  7t3* 

711 

i  712 

u 

◄  7t3 

35.7 

24.9 

30.4 

35.6 

25.2 

30.3 

36.0 

25.0 

30.4 

SNR  =  -15  dB 

INDEX: 
87.7  | 

SELECTED 

7t]*  7C2*  7t3* 

ACTUAL 

^  ^ 

79.5 

10.0 

4.5 

6.6 

83.9 

5.4 

0.0 

0.2 

99.8 

Table  B-23.  Confusion  Matrices  for  Simulated  Modulated  Signals  (Three-Class,  Twenty-Six  Features): 
Perceptron.  (see  App  B  cover  page  for  table  description) 
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INDEX: 

56.1 


SELECTED 


39.5 

7.8 

0.8 

30.2 

0.4 

4.8 

0 

95.2 

SNR  =  20  dB 


SELECTED 

7t?* 


29.7 


46.1 


1.8 


SNR  =  10  dB 


SNR  =  0  dB 


SNR  =  -10  dB 


SNR  =  -20  dB 


INDEX: 

58.2 


I  INDEX: 

|  41.2 

Ttj* 

SELECTEE 

7l2* 

j 

7l3*  | 

L  711 

< 

g  712 

CJ 

30.6 

23.3 

37.5  | 

30.2 

23.8 

37.3  | 

<  n3 

15.5 

12.5 

69.3  J 

INDEX: 

29.6 


INDEX: 

30.4 

7ti* 

SELECTEE 

712* 

7t3* 

2TUAL 

£  5 

10.5 

28.2 

51.3 

10.5 

28.7 

51.0 

<  n3 

10.4 

28.7 

51.8 

INDEX: 

29.9 

71]* 

SELECTEE 

7l2* 

7I3*  1 

2TUAL 

3  £ 

11.4 

37.1 

40.9  I 

11.0 

37.4 

41.0  1 

<  7t3  | 

11.2 

37.2 

40.9  | 

INDEX: 

83.8 


SELECTED 

n 2* 


9.9 


29.7 


0.6 


SNR  =  15  dB 


SNR  =  5  dB 


SELECTED 


30.5 


30.8 


32.3 


SNR  =  -5  dB 


SNR  =  -15  dB 


SELECTED 

7I2* 


4.1 


78.1 


0.0 


No  Noise 


Table  B-24. 


Confusion  Matrices  for  Simulated  Modulated  Signals  (Three-Class,  Eleven-Features): 
Perceptron.  (see  App  B  cover  page  for  table  description) 
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INDEX: 

89.0 


INDEX: 

76.1 


SELECTED 

n2* 


13.9 


80.9 


0.0 


SNR  =  20  dB 


SELECTED 

n2* 


32.3 


61.2 


0.4 


SNR  =  10  dB 


SNR  =  -10  dB 


SNR  =  -20  dB 


INDEX: 

82.6 


INDEX: 

70.4 


INDEX: 

56.3 

7t,* 

SELECTED 

7l2* 

7I3* 

J  711 

44.0 

43.7 

12.3 

*< 

P  7l2 

H 

42.1 

44.9 

13.0 

w 

<  7t3 

10.6 

9.4 

80.0 

SNR  =  0  dB 

INDEX: 

34.7 

Tti* 

SELECTED 

7l2* 

7l3* 

J  711 

33.4 

34.4 

32.2 

O  %2 

H 

34.1 

34.8 

31.1 

u 

<  7t3 

31.5 

32.7 

35.8 

INDEX: 

42.5 


INDEX: 

33.7 

71j* 

SELECTED 

7I2* 

7t3* 

J  711 

31.7 

33.7 

34.6 

g  71 2 

H 

r  \ 

32.4 

34.1 

33.4 

w 

C  7I3 

31.8 

32.7 

35.5 

INDEX: 

94.8 


SELECTED 

712* 


25.0 


0.1 


SNR  =  15  dB 


SELECTED 

712* 


42.9 


58.8 


1.9 


SNR  =  5  dB 


SELECTED 

712* 


37.0 


36.7 


23.5 


SNR  =  -5  dB 


SELECTED 


34.6 


35.0 


34.4 


SNR  =  -15  dB 


SELECTED 

7I2* 


6.3 


90.8 


0.1 


No  Noise 


Table  B-25. 


Confusion  Matrices  for  Simulated  Modulated  Signals  (Three-Class,  Fifty-One  Features): 
MSNN.  (see  App  B  cover  page  for  table  description) 
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INDEX: 

86.0 


INDEX: 

77.7 


INDEX: 

34.8 


J  711 

S  m 

V 

<  7t3 


SELECTED 


SNR  =  20  dB 


SELECTED 


SNR  =  10  dB 


SELECTED 


SNR  =  0  dB 


SELECTED 

n2* 


SNR  =  -10  dB 


SELECTED 


SNR  =  -20  dB 


INDEX: 

81.5 


INDEX: 

69.8 


m 


j  Ki 

|  % 

U 

<  Jt3 


INDEX: 

93.8 


SELECTED 


SNR  =  15  dB 


SNR  =  5  dB 


SELECTED 


SNR  =  -5  dB 


SELECTED 


SELECTED 

ni*  n2* 


SELECTED 


No  Noise 


SNR  =  -15  dB 


Table  B-26.  Confusion  Matrices  for  Simulated  Modulated  Signals  (Three-Class,  Twenty-Six  Features): 
MSNN.  (see  App  B  cover  page  for  table  description) 
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INDEX: 

81.8 


INDEX: 

75.8 


INDEX: 

33.2 


P  711 

g  7t2 

d 

•<  Jl3 


SELECTED 

n2* 


24.2 


69.6 


0.0 


SNR  =  20  dB 


SELECTED 

n2* 


35.9 


65.2 


0.7 


SNR  =  10  dB 


SNR  =  -10  dB 


SELECTED 

n2* 


29.0 


29.5 


30.3 


SNR  =  -20  dB 


INDEX: 

78.7 


INDEX: 

68.0 


INDEX: 

55.3 

Tti* 

SELECTEE 

7t2* 

7l3* 

J  711 

46.9 

37.9 

15.2 

g  7I2 

O 

<  7T3 

42.9 

41.4 

15.8 

10.5 

12.0 

77.5 

SNR  =  0  dB 

INDEX: 

34.5 

7ti* 

SELECTEE 

7t2* 

7t3* 

P  711 

< 

P  7C2 

c 

■<  7l3 

30.1 

35.8 

34.1 

29.0 

36.1 

34.9 

29.0 

33.6 

37.4 

INDEX: 

41.1 


INDEX: 

34.2 


INDEX: 

94.3 


SELECTED 

JIl  *  712* 713* 


31.4 


67.7 


0.0 


SNR  =  15  dB 


SELECTED 

n2* 


40.4 


56.2 


.3 


SNR  =  5  dB 


SELECTED 

7t2* 


32.8 


35.1 


23.1 


SNR  =  -5  dB 


SELECTED 

7I2* 


36.2 


39.2 


36.4 


SNR  =  -15  dB 


SELECTED 

7I2* 


6.3 


89.4 


0.0 


No  Noise 


Table  B-27.  Confusion  Matrices  for  Simulated  Modulated  Signals  (Three-Class,  Eleven-Features): 
MSNN.  (see  App  B  cover  page  for  table  description) 
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INDEX: 

33.4 


SNR  =  10  dB 


SNR  =  0dB 


SELECTED 


SNR  =  -10  dB 


SNR  =  -20  dB 


INDEX: 

72.9 

l  7ti* 

SELECTEI 

7t2* 

1 

7t3*  1 

J  711 

58.7 

40.8 

0.5  1 

3  7l2 

u 

36.5 

63.3 

0.2  1 

<  7t3 

2.0 

1.2 

_ i 

96.8  | 

SNR  =  20  dB 

INDEX: 

64.1 

7tj  * 

SELECTED 

7I2* 

7I3* 

47.3 

52.6 

0.0 

e  * 

38.6 

61.3 

0.1 

C  7I3 

4.7 

11.5 

83.8 

INDEX: 

68.7 


INDEX: 

55.3 


J  1 

§  * 
U 

<  n3 


INDEX: 

43.9 

7ti* 

SELECTEE 

7C2* 

7l3* 

J  711 

36.4 

42.8 

20.8 

H  712 

CJ 

36.2 

42.8 

21.0 

-<  7t3 

22.5 

25.0 

52.5 

34.9 

31.4 

35.7 

30.0 

34.3 

1  INDEX: 
33.2 

l  Tli* 

SELECTEE 

7t2* 

713*  1 

|j  711 

33.4 

32.2 

34.3  | 

=  % 
y 

32.2 

33.4  | 

◄  7t3 

la 

32.3 

34.1  | 

SELECTED 

Th* 


46.0 


57.6 


.2 


SNR  =  15  dB 


SELECTED 

71,* 


49.0 


50.2 


17.4 


SNR  =  5  dB 


SELECTED 

712* 


37.1 


37.2 


35.3 


SNR  =  -5  dB 


SELECTED 

7I2* 


29.7 


29.9 


29.7 


SNR  =  -15  dB 


No  Noise 


Table  B-28.  Confusion  Matrices  for  Simulated  Modulated  Signals  (Three-Class,  Fifty-One  Features): 
MSNN  Mod  1.  (see  App  B  cover  page  for  table  description) 
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INDEX: 

43.6 


INDEX: 

34.0 


INDEX: 

33.3 


INDEX: 

71.4 

TIi* 

SELECTEE 

n2* 

7l3* 

J  711 

^  7t2 

U 

<  7t3 

48.1 

51.5 

0.3 

31.3 

68.2 

0.4 

1.3 

0.7 

97.9 

SNR  =  20  dB 

INDEX: 

66.3 

7t]* 

SELECTED 

7C2* 

7t3* 

j  * 

I 

U 

◄  7l3 

46.8 

52.7 

0.4 

34.2 

65.3 

0.5 

5.2 

8.1 

86.7 

SNR=  10  dB 


SELECTED 


42.6 


42.8 


25.1 


SNR  =  0  dB 


SELECTED 


33.0 


33.7 


32.6 


SNR  =  -10  dB 


SELECTED 

712* 


32.4 


32.3 


32.6 


SNR  =  -20  dB 


INDEX: 

68.9 


INDEX: 

57.7 


INDEX: 

35.5 


INDEX: 

33.4 


INDEX: 

64.4 


J  711 

|  "2 

<  7t3 


SELECTED 

7C2* 


49.4 


61.8 


2.1 


SNR  =  15  dB 


SELECTED 

712* 


48.6 


50.2 


14.3 


SNR  =  5  dB 


SELECTED 


31.5 


32.5 


29.3 


SNR  =  -5  dB 


SELECTED 


33.5 


33.4 


33.7 


SNR  =  -15  dB 


SELECTED 

7T2* 


90.6 


93.1 


0.0 


No  Noise 


Table  B-29.  Confusion  Matrices  for  Simulated  Modulated  Signals  (Three-Class,  Twenty-Six  Features): 
MSNN  Mod  1.  (see  App  B  cover  page  for  table  description) 
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46.4 


0.6 


SNR  =  20  dB 


SNR  =  10  dB 


SNR  =  -10  dB 


SNR  =  -20  dB 


INDEX: 

71.2 


SELECTED 

7t2* 


40.3 


57.8 


1.6 


SNR  =  15  dB 


INDEX: 

57.9 


SELECTED 

712* 


50.8 


57.0 


20.2 


SNR  =  5  dB 


1  INDEX: 

|  47.7 

I  71,* 

SELECTEE 

7C2* 

7t3* 

L  711 

|  41.4 

41.0 

17.6 

g  "a 

|  39.5 

42.0 

18.6 

|  ^ 

20.0 

20.1 

59.9 

SNR  =  0  dB 

INDEX: 

33.6 

7tj* 

SELECTED 

7t2* 

7t3*  l 

J  711 

34.2 

32.9 

32.9  1 

rj 

33.5 

33.4 

33.1  | 

<  7t3 

33.7 

33.2 

33.2  | 

32.6 


32.2 


SNR  =  -5  dB 


31.4 


30.6 


SNR  =  -15  dB 


INDEX: 

33.3 

l  71,* 

SELECTED 

7t2* 

7t3*  | 

J  711 

<< 

| 

35.6 

34.7  1 

&  n2 

H  2 

29.7 

35.6 

34.7  1 

< 

■1 

35.1 

34.5  | 

SELECTED 

7t2* 


16.5 


33.3 


0.0 


No  Noise 


Table  B-30. 


Confusion  Matrices  for  Simulated  Modulated  Signals  (Three-Class,  Eleven-Features): 
MSNN  Mod  1.  (see  App  B  cover  page  for  table  description) 


138 


SELECTED 

7t2* 


20.5 


83.0 


0.3 


SNR  =  20  dB 


SNR  =  0  dB 


SNR  =  -20  dB 


INDEX: 

81.3 


INDEX: 

74.0 

Tli* 

SELECTED 

7I2* 

7I3* 

J  * 

61.8 

37.9 

0.3 

P  7l2 

H 

37.8 

62.0 

0.2 

u 

<  7t3 

0.9 

0.8 

98.4 

SNR  =  10  dB 

INDEX: 

55.8 

7ti* 

SELECTED 

7t2* 

7T3* 

J  711 

38.7 

46.8 

14.4 

P  n2 

H 

37.9 

47.0 

15.1 

U 

<  7I3 

9.8 

8.5 

81.8 

INDEX: 

67.4 


INDEX: 

34.0 

71]  * 

SELECTED 

7l2* 

7I3* 

J  711 

31.2 

34.5 

34.3 

g  7t2 

H 

32.3 

34.2 

33.5 

u 

<  7t3 

30.0 

33.4 

36.6 

SNR  =  -10  dB 

INDEX: 

33.5 

71,  * 

SELECTED 

7t2* 

7I3* 

J  ”* 

34.3 

31.7 

34.1 

g  7t2 

34.9 

31.6 

33.5 

o 

◄  7t3 

34.0 

31.3 

34.7 

SELECTED 

712* 


29.3 


74.9 


0.7 


SNR  =  15  dB 


SELECTED 

7I2* 


43.7 


53.7 


2.8 


SNR  =  5  dB 


11.2 


90.3 


0.3 


No  Noise 


INDEX: 

41.9 

Tti* 

SELECTED 

7t2* 

7t3* 

J  7X1 

37.1 

34.7 

28.2 

$  Jt2 

H 

35.7 

34.4 

29.9 

u 

■<  7t3 

21.8 

24.0 

54.2 

SNR  =  -5  dB 

INDEX: 

33.1 

Tti* 

SELECTEE 

7l2* 

7l3* 

"7  711 

46.8 

27.5 

25.7 

g  7t2 

H 

46.8 

27.5 

25.7 

u 

•<  7I3 

47.2 

27.8 

SNR  =  -15  dB 


SELECTED 

7ti*  7t2* 713* 


Table  B-31.  Confusion  Matrices  for  Simulated  Modulated  Signals  (Three-Class,  Fifty-One  Features): 
MSNN  Mod  2.  (see  App  B  cover  page  for  table  description) 
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INDEX: 

83.2 

JIl* 

SELECTED 

n2*  n3* 

J  711 

75.7 

24.3 

0.0 

g  712  i 

rj  1 

25.8 

74.1 

0.1 

<  % 

0.0 

0.1 

99.9 

SNR  =  20  dB 


INDEX: 

77.7 

SELECTED 

Tti*  n2*  n3* 

j  ”■ 

i 

u 

<  n 3 

68.4 

31.6 

0.0 

34.7 

65.2 

0.1 

0.2 

0.3 

99.5 

SNR  =  10  dB 


INDEX: 

55.0 

SELECTED 

iti*  Jt2*  n3* 

J  711 

i  n 2 

w 

<  n3 

48.0 

39.8 

12.2 

46.3 

41.7 

12.0 

12.9 

11.9 

75.3 

SNR  =  0  dB 


INDEX:  SELECTED 

341  nt*  Jt2*  jt,* 

j  711  26.6 

38.9 

34.5 

g  712  27.2 

Q 

<1  n3  25.7 

37.2 

■B 

SNR  =  -10  dB 


INDEX: 

33.2 

7Ii* 

SELECTED 

tc2*  Jl3* 

J  711 

39.5 

25.4 

35.1 

g  712 

rj 

40.4 

24.7 

34.9 

7C3 

39.5 

25.2 

35.3 

SNR  =  -20  dB 


INDEX: 

80.8 

SELECTED 

n,*  n2*  ji3* 

J  711 

71.3 

28.6 

0.0 

e  *> 

rj 

28.7 

71.1 

0.2 

<  n3 

0.0 

0.1 

99.9 

SNR  =  15  dB 


INDEX: 

66.0 

SELECTED  I 

Tti*  n2*  jt3* 

J  711 

58.4 

38.5 

3.1 

g  ^ 

rj 

43.7 

51.5 

4.9 

<  n3 

6.6 

5.2 

88.2 

SNR  =  5  dB 


INDEX: 

42.3 

SELECTED 

ni*  n2*  n3* 

■J  711 

g  712 

W 

<  7t3 

28.3 

41.0 

30.7 

26.5 

40.5 

33.0  | 

16.6 

25.2 

58.3 

SNR  =  -5  dB 


INDEX: 

33.5 

SELECTED 

Jtl*  7t2*  7t3* 

1 

u 

<  7l3 

32.8 

28.4 

38.8 

33.1 

27.8 

39.1 

33.8 

26.3 

39.9 

SNR  =  -15  dB 


INDEX:  1 
92.2 

SELECTED  | 

Uj  *  n2*  n3* 

J  711 

84.6 

14.9 

0.5 

g  712 

rj 

7.8 

92.1 

0.1 

<  n3 

0.0 

0.1 

99.8 

No  Noise 


Table  B-32.  Confusion  Matrices  for  Simulated  Modulated  Signals  (Three-Class,  Twenty-Six  Features): 
MSNN  Mod  2.  (see  App  B  cover  page  for  table  description) 
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INDEX: 

81.1 


SELECTED 

Tfr* 


SNR  =  10  dB 


SNR  =  -10  dB 


SNR  =  -20  dB 


J  711 

68.8 

28.3 

3.0 

g  7l2 

H 

U 

<  n3 

24.9 

74.8 

0.3 

0.3 

0.0 

99.7 

SNR  =  20  dB 

INDEX: 

75.1 

7tj* 

SELECTEE 

7t2* 

7t3* 

J  * 

67.9 

31.3 

0.8 

<3 

P  7l2 

H 

40.1 

59.5 

0.4 

U 

<  7t3 

1.0 

1.2 

97.8 

INDEX: 

77.3 


INDEX: 

64.4 


INDEX: 

55.2 

7Ci* 

SELECTEE 

7C2* 

7I3* 

J  *• 

47.1 

36.8 

16.1 

g  "7 

u 

◄  7l3 

43.2 

40.1 

16.7 

9.6 

12.1 

78.3 

SNR  =  0  dB 

INDEX: 

34.2 

7tj* 

SELECTEE 

7l2* 

7t3* 

J  711 

27.1 

34.6 

38.3 

g  7l2 

H 

U 

◄  713 

26.9 

33.9 

39.2 

25.5 

32.8 

41.6 

INDEX: 

40.8 


INDEX: 

33.6 


INDEX: 

33.4 

7lj  * 

SELECTEE 

7t2* 

7t3*  | 

J  711 

39.3 

25.9 

34.8 

g  7l2 

H 

r  \ 

39.7 

25.9 

34.5 

w 

<  7t3 

38.6 

26.4 

35.0 

INDEX: 

91.4 


SELECTED 

7l2* 


38.1 


71.0 


0.2 


SNR  =  15  dB 


SELECTED 


39.8 


53.5 


8.1 


SNR  =  5  dB 


SELECTED 


SNR  =  -5  dB 


SELECTED 

7I2* 


43.3 


44.2 


43.7 


SNR  =  -15  dB 


SELECTED 

712*  713* 


13.1 


89.5 


0.0 


No  Noise 


Table  B-33.  Confusion  Matrices  for  Simulated  Modulated  Signals  (Three-Class,  Eleven-Features): 
MSNN  Mod  2.  (see  App  B  cover  page  for  table  description) 
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1  INDEX: 
88.7 

Tli* 

SELECTEE 

n2* 

7l3* 

I  * 

rj 

85.9 

14.1 

0.0 

19.5 

80.5 

0.0 

<  7I3 

0.0 

0.3 

99.7 

INDEX: 

33.7 


J  711 

|  * 

U 

7I3 


SNR  =  20  dB 


SNR=  10  dB 


SNR  =  -10  dB 


SELECTED 


34.7 


35.2 


34.0 


SNR  =  -20  dB 


INDEX: 

82.5 


INDEX: 

75.4 

SELECTEE 

n2* 

7t3* 

J  711 

|  "2 

u 

<  7C3 

66.3 

32.0 

1.7 

38.2 

60.4 

1.4 

0.1 

0.4 

99.5 

1  INDEX: 
56.2 

7t,* 

SELECTEE 

n2* 

! 

rc3*  1 

L  711 

< 

U 

<  n3 

44.5 

42.9 

12.6  j 

42.7 

44.1 

13.2  | 

10.8 

9.1 

80.1  1 

SNR  =  0  dB 

SELECTED 

■MMI 

7t,* 

n2* 

n3* 

J  711 

35.3 

31.3 

O  n2 

H 

rj 

33.9 

35.7 

30.4 

<  % 

31.3 

33.3 

35.4 

SELECTED 

U2* 


25.7 


73.4 


0.1 


SNR  =  15  dB 


SELECTED 


43.5 


58.6 


1.8 


SNR  =  5  dB 


SELECTED 


37.1 

36.3 

26.6 

35.9 

35.8 

28.4 

22.1 

23.8 

54.1 

SNR  =  -5  dB 


33.9 


34.5 


33.8 


SNR  =  -15  dB 


SELECTED 

7I2* 


5.2 


87.2 


0.1 


No  Noise 


Table  B-34.  Confusion  Matrices  for  Simulated  Modulated  Signals  (Three-Class,  Fifty-One  Features): 
MSNN  Mod  3.  (see  App  B  cover  page  for  table  description) 
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INDEX: 

85.8 

SELECTED 

Jti*  n2*  n3* 

P  711 

g  n2 

H 

U 

<  n3 

77.6 

22.3 

0.0 

20.0 

79.9 

0.0 

0.0 

0.0 

99.9 

SNR  =  20  dB 


INDEX: 

77.6 

SELECTED 

ji,*  Jt2*  ji3* 

ACTUAL  1 

5  3  3 

69.5 

30.4 

0.2 

36.1 

63.7 

0.2 

0.2 

0.3 

99.5 

SNR  =  10  dB 


INDEX: 

56.0 

SELECTED 

Tii*  Jt2*  n3* 

P  * 

i  "2 

o 

<  h3 

45.5 

42.2 

12.3 

42.9 

44.4 

12.7 

11.0 

10.9 

78.1 

SNR  =  0  dB 


INDEX: 

34.8 

SELECTED 

JIl*  Jt2*  Jl3* 

P  711 

g  n2 

H 

U 

<  n3 

33.8 

33.1 

33.1 

34.1 

33.5 

32.5 

31.4 

31.4 

37.2 

SNR  =  -10  dB 


INDEX: 

33.0 

SELECTED 

Hi*  n2*  n3* 

P  711 
g  n2 

V 

<  n3 

36.6 

32.8 

30.6 

37.2 

32.4 

30.4 

36.5 

33.6 

29.9 

SNR  =  -20  dB 


INDEX: 

81.2 

SELECTED 

7tl  *  7t2*  7C3* 

P  711 

73.0 

27.0 

0.0 

P  7l2 

H 

r  \ 

28.8 

71.2 

0.0 

<  n3 

0.0 

0.5 

99.5 

SNR  =  15  dB 


INDEX: 

69.8 

SELECTED 

Jti*  Jl2*  Jt3* 

P  711 

56.1 

42.9 

0.9 

3 

r  \ 

40.6 

58.0 

1.4 

W 

<  n3 

2.1 

2.5 

95.4 

SNR  =  5  dB 


INDEX: 

41.7 

SELECTED 

7Il*  Jt2*  7t3* 

P  711 

i  % 

y 

<  n3 

37.2 

36.6 

26.2 

35.9 

36.2 

27.9 

22.6 

25.8 

51.7 

SNR  =  -5  dB 


INDEX: 

33.5 

SELECTED 

Jli*  Ha*  %3* 

3  "■ 

I  * 

U 

<  n3 

31.4 

34.1 

34.5 

31.9 

33.4 

34.7 

32.6 

31.7 

35.7 

SNR  =  -15  dB 


INDEX: 

93.1 

SELECTED 

Jti*  Jt2*  Jt3* 

p  711 

92.8 

7.2 

0.0 

P  JI2 

r  \ 

13.3 

86.7 

0.0 

W 

<  n3 

0.0 

0.1 

99.9 

No  Noise 


Table  B-35.  Confusion  Matrices  for  Simulated  Modulated  Signals  (Three-Class,  Twenty-Six  Features): 
MSNN  Mod  3.  (see  App  B  cover  page  for  table  description) 
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■Mi 

SELECTED 

7t,*  n2*  n3* 

< 

£  "2 

•<  n3 

73.5 

26.3 

0.2 

28.3 

71.7 

0.0 

2.1 

0.0 

97.9 

SNR  =  20  dB 


KBM 

SELECTED 

it,*  7t2*  Jt3* 

J  n' 

1  % 

u 

<  1t3 

63.9 

35.8 

0.4 

34.7 

65.0 

0.3 

0.5 

0.7 

98.8 

SNR  =  10  dB 


SELECTED 

it,*  n2*  n3* 

2  % 
u 

<  n3 

46.6 

38.3 

15.1 

42.4 

15.9 

10.5 

11.7 

77.8 

SNR  =  0  dB 


SELECTED 

Ttl*  Jt2*  7T3* 

33.5 

32.7 

33.8 

32.4 

32.5 

35.1 

31.3 

31.2 

37.6 

SNR  =  -10  dB 


7t,* 

SELECTED 

7l2*  713* 

^  711 

35.8 

30.9 

33.3 

3  * 

rj 

30.5 

32.0 

<  n3 

31.9 

32.2 

SNR  =  -20  dB 


Table  B-36.  Confusion  Matrices  for  Simulated 


INDEX: 

78.5 

SELECTED 

7t,*  712*  7l3* 

J  7X1 

68.4 

31.5 

0.1 

= 

rj 

Bl 

67.6 

0.0 

<  7l3 

0.2 

0.2 

99.6 

SNR  =  15  dB 


INDEX: 

68.0 

SELECTED 

7ti*  7t2*  7l3* 

J  7X1 

55.3 

40.8 

3.9 

e 

rj 

40.1 

3.8 

◄  7l3 

3.1 

4.4 

92.6 

SNR  =  5  dB 


INDEX: 

SELECTED  1 

40.9 

71,* 

712* 

Its* 

J  7X1 

35.1 

32.4 

32.5 

g  %  ; 
u 

■<  n3 

35.6 

34.3 

30.0  | 

24.0 

22.8 

53.3  | 

SNR  =  -5  dB 


INDEX: 

34.3 

SELECTED  II 

71,*  7l2*  7t3*  1 

|  ACTUAL 

3  3  3 

30.9 

36.1 

33.0 

29.0 

39.0 

32.0 

30.7 

36.2 

33.1 

SNR  =  -15  dB 


INDEX: 

92.6 

SELECTED 

71,*  7t2*  7t3* 

J  7X1 

1 

7.2 

0.0 

g  n, 

rj 

13.4 

86.6 

0.0 

<  7t3 

1.6 

0.0 

98.4 

No  Noise 

Signals  (Three-Class,  Eleven-Features): 


MSNN  Mod  3.  (see  App  B  cover  page  for  table  description) 
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Ave  Correct  Classification  (%)  Ave  Correct  Classification  (%) 


Figure  B-3.  MSNN  Performance  Results. 
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Ave  Correct  Classification  (%)  |  m  \  Ave  Correct  Classification  (%)  |  Ave  Correct  Classification  (%) 


figure  B-4.  MSNN  Mod  1  Performance  Results. 


APPENDIX  C.  MATLAB  CLASSIFICATION  PROGRAMS 


This  section  contains  the  MATLAB  programs  used  to  generate  the  simulation 
results  discussed  in  Chapters  IV  and  V.  These  functions  are  categorized  as  either 
common  or  specific  to  a  particular  classification  scheme. 

A.  COMMON  PROGRAMS 


The  common  programs  included  in  this  section  include  the  main  program;  feature 
simulation  functions;  modulated  signal  simulation  and  feature  extraction  functions;  and 
data  conditioning  and  display  routines. 


Controlling  Program:  simmsnnjcompare.m 


%*★******************★************************★**************★*****★********************* 
%  COMPARE  classification  methods 
% 

%  5  March  2000 
%  Miguel  G.  San  Pedro 

%**************************************************************************************** 

clear 

format  compact 
format  short  e 

global  gloUsrReq 

gloUsrReq  =  input {'Skip  all  optional  displays  (Y/N) : 
global  gloUsrPlot 

gloUsrPlot  =  input ('Plot  learning  curves  (Y/N):  '#'s'); 


num_data  =  []  ; 
class_mean  =  [] 
class_cov  =  [ ] ; 
class_var  =  []; 
classData  =  [ ]  ; 
testClass  =  []; 
snr  =  [ ] ; 


%  number  of  training  realizations 
%  feature  mean  values 
%  feature  covariance  matrix 
%  feature  variance  values 
%  training  data  set 
%  testing  data  set 
%  training/ testing  signal  SNR 


save  test\testClass.dat  testClass  -ascii  -tabs 

%  ASK  if  simulate  signal  or  simulate  data 
usrReq  =  input {' Simulate  <signal>  or  <*data*>: 
disp('  ') 


%  GENERATE  testing/ training  data 
if  (usrReq  ==  'signal') 

num_class  =3;  %  number  of  signal  classes 

A  =  4;  %  SET  signal  amplitude 

T  =  le-7 ;  %  SET  bit  period  (sec) 

fs  »  5e8;  %  SET  bit  sampling  frequency  (samples /sec) 

fc  -  4e7 ;  %  SET  carrier  frequency  (Hz_ 

n  =  linspace (0,T, fs*T) ? 

features  =  [};  %  vector  of  distinguishing  features 

tmFeatures  =  [];  %  vector  of  class  distinguishing  features 

mnFeatures  =  [);  %  vector  of  class  distinguishing  feature  mean 

covFeatures  =  [];  %  class  covariance  matrix 
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varFeatures  =  [];  %  vector  of  class  distinguishing  feature  variance 

%  DETERMINE  signal  features 

disp  ( '  EXTRACTING  SIGNAL  FEATURES . '  ) 

features  =  detFeatures? 

[numRows,num_features]  =  size (features) ; 

disp ([ 'Number  of  features:  ' ,num2str (num_features) ] ) 

disp  ( '  ' ) 

%  GENERATE  signals 

num__data  =  input{ 'Enter  number  of  training  signals  (def=100) :  '); 

if  (isempty (num_data) ) 
num_data  =  100; 
end 

usrSNR  =  input ('Add  noise  (Y/N) :  ','s'); 

if  (usrSNR  ==  'Y') 

snr  =  input ('Enter  signal  SNR  (default=0dB) :  '),- 

disp('  ') 

if  (isempty (snr) ) ; 
snr  =  0; 

end 

else 

snr  =  9999; 

end 

plotSignal  ( 'plot2ASK'  /  A,  T,  fc,n,  features,  snr) 
plotSignal ( 'plot2PSK' ,  A,T, fc,n, features, snr) 
plotSignal ( 'plot2FSK' , A, T, fc,n, features, snr) 

[tmFeatures /mnFeatures, covFeatures, varFeatures]  .  .  . 

=  genSignal  ( '  gen2ASK'  ,num_data, A, T,  fs,n,  features,  snr)  ; 
classData  =  [classData;  tmFeatures]  ; 
class_mean  ~  [class_mean  mnFeatures] ; 
class_cov  =  [class_cov  covFeatures] ; 
class_var  =  [class_var  varFeatures] ; 

[tmFeatures, mnFeatures, covFeatures, varFeatures] . . . 

=  genSignal ( 'gen2PSK' , num_dat a , A , T , fs,n, features, snr) ; 
classData  =  [classData; tmFeatures]  ; 
class_mean  =  [class_mean  mnFeatures]; 
class_cov  =  [class_cov  covFeatures] ; 
class_var  =  [class__var  varFeatures]  ; 

[tmFeatures, mnFeatures, covFeatures, varFeatures] . . . 

=  genSignal ( 'gen2FSK' , num_data, A, T, fs,n, features, snr) ; 
classData  =  [classData;  tmFeatures]  ; 
class_mean  =  [class_mean  mnFeatures] ; 
class_cov  =  [class_cov  covFeatures] ; 
class_var  =  [class_var  varFeatures] ; 

%  GENERATE  random  test  data 
load  testClass.dat 

randTest  =  100*randn (num_features,num_data*10)  ; 

testClass  =  [testClass; randTest ] ; 

save  test\testClass.dat  testClass  -ascii  -tabs 

else 

%  ASK  user  for  input  data;  else  set  default  values 
num_class  =  [];  %  number  of  signal  classes 

num_features  =  [];  %  number  of  distinguishing  features 

userlnput  =  input ('Enter  user  defined  inputs  (Y/N):  ','s'); 

if  (userlnput  ==  'Y') 
disp ( '  ') 

[num_data,num_class,num_features,  class_mean,  class_var]  .  . . 

=  userData  (num_data,  num__class , num_features,  class_mean,  class_var )  ; 
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else 

%  default  values 
num_data  =  100; 
num_class  =  3; 
num_features  =  3; 

class_mean  =  2*rand(num_features,nuin_class)  -  1; 

usrSNR  =  input  ('Add  noise  (Y/N)  :  ' ,  ’  s' ) -, 

if  (usrSNR  ==  'Y') 

snr  =  input ('Enter  feature  SNR  (default=0dB) :  '); 

if  (isempty (snr) ) 
snr  =  0; 

end 

snrConst  =  10A(snr/10); 

for  k  =  l:num_class 
cont  =  1; 
classVar  =  [ ] ; 

varPower  =  num_features/ snrConst; 
while (cont) 

classVar  =  rand(num_features-l, 1) /snrConst ; 
lastVar  =  varPower'-  sum  { classVar ) ; 
if  (lastVar  >=  0) 

classVar  =  [classVar'  lastVar]'; 
cont  =  0; 

end 

end 

class_var  =  [class_var  classVar] ; 

end 

else 

class_var  =  zeros (num_features, num_c lass) ; 

end 

%  NOTE:  with  class_mean  and  classVar,  construct  data  then 
%  covariance  matrix 

end 

class_mean 

class_var 

%* ********************************************************************* 
%  GENERATE  class  training/ testing  data 

%  NOTE:  genclass_compare  GENERATES /RETURNS  training  data  and  STORES 
%  testing  realizations  in  work\test 

%  dim(classData)  =  num_f eatures*num_class  x  num_data 

[classData,  class_cov]  =  gene las s_compare  (num_data,  class_mean,  class_var ) 

[rowData,num_data]  =  size (classData) ; 
if  (rowData  num_features*num_class) 
disp(' ERROR  in  data  field') 

end 

end 

%************************************ ***************** ******************** 
%  NORMALIZE  training  and  testing  data  by  standard  deviation  (Method2) 

[ classData_norm]  =  dataMethod2(classData,class_mean,class_var); 

%** *************************************************************  ********** 
%  PLOT  performance  parameter  and  error  surfaces /contours  over  a  range 
%  of  w  and  b 

plotMS  (num_class,  num__f  eatures ,  classData,  classData_norm) 

%************************************************************************* 
%  SET  NN  training  parameters 

al=20;  %  epochs  between  updating  display 

a2=500;  %  maximum  number  of  epochs  to  train 

a3=100;  %  initial  learning  rate 


3-4=2 ;  %  learning  rate  increase 

a5=0 . 5 ;  %  learning  rate  decrease 

a6=0.9;  %  momentum  constant 

a7=1.04;  %  maximum  error  ratio 

tp  =  [al  a2  a3  a4  a5  a6  a7]  ; 

%  INITIALIZE /GENERATE  5  sets  of  weight  and  bias  values, 
w  =  2*randn (num_features/  5) -1; 
b  =  randn (1,5) ; 

%  MONITOR  MD,  weight,  bias  update 
%checkWB  -  [ ] ; 

%save  checkWB.dat  checkWB  -ascii  -tabs 
%  INITIALIZE  confusion  matrix  counters 

%  note:  reset  confusion  matrix  when  change  class  number,  feature  number,  or  SNR 
reset  =  input ('Reset  confusion  matrix  counters  (Y/N)  :  ','s'); 

if  (reset  ==  'Y' ) 

typeA  =  zeros (num_class+l,num_class) ; 
typeB  =  zeros (num_class+l,num_class) ; 
typeBl  =  zeros (num_class+l,num_class) ; 
typeC  =  zeros (num_class+l , num_class ) ? 
typeStat  =  zeros (num_class+l ,num_class) ; 

save  typeA.dat  typeA  -ascii  -tabs 
save  typeB.dat  typeB  -ascii  -tabs 
save  typeBl .dat  typeBl  -ascii  -tabs 
save  typeC.dat  typeC  -ascii  -tabs 
save  typeStat.dat  typeStat  -ascii  -tabs 

end 

%************************************ **************************************************** 
%*★*************★************************************************************************ 
%  A.  TRAIN/TEST  standard  MSNN 
cd  Method_SPl 

disp{'  ') 

disp (' **************************************************/) 
disp ( '  A.  MSNN') 
fig  =  2000; 

%  type  is  CONFUSION  MATRIX 

%  note:  type  tracks  confusion  matrix  for  these  5-runs 
%  typeA  tracks  confusion  matrix  for  multiple  5-run 

%  individual  runs  tracked  by  confusion  matrix  in  simmsnn.m  (i.e.,  typel) 

type  =  zeros  (num_class+l,num__class)  ; 
save  type . dat  type  -ascii  -tabs 

for  m  =  1:5 

disp (('Rum  ' , num2str (m) ] ) 

simmsnn( '  trms_sp' ,  1,  classData,num_features, w( :  ,m)  ,b(l,m)  ,  tp,  fig)  ; 
disp ( 7  ') 

fig  =  fig+l+sum(l :  (num_.class-l) )  ; 

end 

load  type.dat 
disp  ( '  ') 

for  m  =  1 :num_class+l 

disp( [ 'TYPE' ,num2str (m) , ' :  ' ,num2str ( type (m, : ) ) } ) 

end 
cd  .  . 

load  typeA.dat 

[Arow,Acol]  =  size(typeA); 

temp  A  =  typeA  (Arow-num_c  las  s  :Arow,  :) ; 

temp A  =  tempA  +  type; 

typeA  =  [typeA; tempA] ; 
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save  typeA.dat  typeA  -ascii  -tabs 

%* *************************************************************************************** 
s^* *************************************************************************************** 

%  B.  TRAIN/TEST  MSNN  with  normalized  projection  space  (MSNN  Mod  2) 
cd  Method_SP5 

disp { 7  ') 

disp {'**************************************************') 
disp('B.  MSNN  with  norm  projection  space  (MSNN  Mod  3)') 
fig  =  2500; 

%  type  is  CONFUSION  MATRIX 

type  =  zeros (num_class+l/num_class) ; 

save  type.dat  type  -ascii  -tabs 

for  m  =  1 : 5 

disp  (['Run  7  ,num2str  (m)  ] ) 

simmsnn( 7  trms_sp5 7 ,  5 ,  classData,nuin_features,  w( :  ,m)  , b(l  ,m) ,  tp,  fig) ; 
di sp ( 7  ') 

fig  =  fig+l+sum(l :  (nuiruclass-1) ) ; 

end 

load  type.dat 
disp ( 7  ') 

for  m  =  1 :num_class+l 

disp  ( [ 7 TYPE 7 ,  num2str  (m)  ,  7  :  7  ,num2str  ( type  (m,  : )  )  ] ) 

end 
cd  .  . 

load  typeB.dat 

[Brow,Bcol]  =  size (typeB) ; 

tempB  =  typeB (Brow-num_c lass :  Brow,  : )  ; 

tempB  =  tempB  +  type; 

typeB  =  [typeB; tempB] ; 

save  typeB.dat  typeB  -ascii  -tabs 

%*****************************************************★********************************** 
%* ********************************************************************************  ******* 
%  Bl.  TRAIN/TEST  MSNN  and  VMR  termination  reqmt  (MSNN  Mod  3) 
cd  Method_SP8 

disp  ( 7  7 ) 

disp ('**************************************************') 
disp(7Bl.  MSNN  with  VMR  termination  (MSNN  Mod  3)7) 
fig  =  2500; 

%  type  is  CONFUSION  MATRIX 

type  =  zeros (num_class+l ,num_class) ; 

save  type.dat  type  -ascii  -tabs 

for  m  =  1:5 

disp  (['Run  7  ,num2str  (m)  ]  ) 

simmsnn  ( 7  trms_sp87 , 8,  classData,num_features,  w( :  ,m)  ,b{l,m)  ,  tp,  fig)  ; 
disp  ( 7  ') 

fig  =  fig+l+sum(l:  (num_class-l) )  ; 

end 

load  type.dat 
disp ( 7  ') 

for  m  =  1 :num_class+l 

disp ( [ 'TYPE7 ,num2str (m)  ,  7  :  7 ,num2str ( type (m, s ) ) ] ) 

end 
cd  .  . 

load  typeBl.dat 

[Blrow, Blcol]  =  size (typeBl) ; 
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tempBl  =  typeBl (Blrow-num_c lass :B1 row, :) ; 

tempBl  =  tempBl  +  type; 

typeBl  =  [ typeBl ; tempBl ] ; 

save  typeBl . dat  typeBl  -ascii  -tabs 


%******★**★******************************************************* 
%*+***********★*************************************************** 
%  C.  TRAIN/TEST  MSNN  with  preconditioned  input  space  (MSNN  Mod  1) 
cd  Method_SP2 

disp('  ' ) 

disp ('*************************************************★/ ) 
disp('C.  MSNN  with  precond  input  (MSNN  Mod  1)') 
fig  =  3000; 

%  type  is  CONFUSION  MATRIX 

type  =  zeros  (num_class+l ,num_class)  ; 

save  type . dat  type  -ascii  -tabs 

for  m  *  1:5 

disp (['Run  ' ,num2str (m) ] ) 

simmsnn_C(classData_norm,num_features,w( :  ,m)  ,b(l,m)  ,  tp,  fig)  ; 
disp ( '  ' ) 

fig  =  fig+l+sum(l: (num_class-l) ) ; 

end 

load  type . dat 
di sp { '  ') 

for  m  =  1 :num_class+l 

disp ( [ 'TYPE' , num2str (m) ,  '  :  ' , num2str ( type (m, : ) ) ] ) 

end 
cd  . . 

load  typeC.dat 

[Crow,Ccol]  =  size(typeC); 

tempC  =  typeC  ( Crow-num__c lass: Crow,  :)  ; 

tempo  =  tempo  +  type; 

typeC  =  [typeC; tempo] ; 

save  typeC.dat  typeC  -ascii  -tabs 

%******************★********************************************** 
%**★************************************************************** 
%  D.  PERCEPTRON  NN 
disp('  ') 

di  sp  { '  **■************************************************  / ) 
disp('D.  Perceptron') 
cd  Method_SP7 

%  type  is  CONFUSION  MATRIX 
type  =  zeros  (num_class+l,nuin_class)  ; 
save  type.dat  type  -ascii  -tabs 
noType  =  0; 

save  noType.dat  noType  -ascii  -tabs 
for  m  *  1:5 

disp(['Run  '  ,num2str  (m)  ] ) 

percptmClassifier  (num_class ,  classData, w( :  ,m)  ,b( :  ,m) ) 
disp ( '  ') 

end 

load  type.dat 

for  m  -  1 :num_class+l 

disp ( [ 'TYPE ' , num2str (m) , ' :  ' , num2str ( type (m, : ) ) ] ) 

end 

load  noType.dat 

disp  (['BAD  TYPE:  ',  num2str  (noType)  ]  ). 


cd  .  . 

load  typeD.dat 

[Drow,Dcol]  =  size(typeD); 

tempD  =  typeD (Drow-num_class :Drow, :) ; 

teirpD  =  ternpD  +  type; 

typeD  =  [typeD; tenpD]  ; 

save  typeD.dat  typeD  -ascii  -tabs 

%********************************************************************★******************* 
%****** ***************************************************************************  ******* 
%%  E.  TEST  iaw  Brunzell /Eriksson  quadratic  classifier 
disp{ '  ') 

&isp { ‘ *************************************************** ) 
disp('E.  Statistical  Classifier') 

statClassif  ier  (nui^data/nuiruclass/num^f  eatures,  class_mean,  class_cov) 

2.  Feature  Simulation 
a.  userData.m 

function  [num_data,  num_class,num_f eatures ,  class_mean,  class_var]  .  . . 

=  userData  (num_data,num_class,  num_features,  class_mean,  class_var) 

%************************★********************************★****************************** 
%  Function 

%  -  PROMPTS  user  for  data  specifications 

%  -  if  no  user  data  entered,  default  values  used 

% 

%  Use:  [num_data,num_class/num_features,class_inean, class_var] 

%  =  userData (num_data, num_class , num_features ,  class_jmean, class_var ) 

% 

%  Input /Returns 

%  num__data:  number  of  training  signals  to  construct 

%  num_class:  number  of  signal  classes 

%  num_f eatures :  number  of  distinguishing  features 

%  class_mean:  7num_class7  ' num_f eatures 7 xl  vectors  of  class  feature  means 

%  class_var:  7num_class7  7num_features'xl  vectors  of  class  feature  variances 

% 

%  25  January  2000 
%  Miguel  G.  San  Pedro 

%**************************************************************************************** 
dispt'When  asked  for  values,  hit  <enter>  to  use  default  values7) 
disp ( 7  7 ) 

num_data  =  input ('Enter  number  of  training  signals  (default=100) :  '); 

if  (isempty (num_data) ) 
num_data  =  100; 
end 

disp ( 7  ') 

num_class  =  input ('Enter  number  of  classes  (default=3):  '); 

if  (iseirpty  (num_class) ) 
num_class  =  3; 
end 

disp ( 7  7 } 

num_f eatures  =  input ('Enter  number  of  features  (default=3) :  '); 
if  (isempty (num_f eatures ) ) 
num_f eatures  =  3 ; 
end 

if  (num_f eatures  <  num_class) 

disp( 'ERROR:  number  of  distinguishing  features  >  number  of  classes') 

end 
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disp{'  ') 

userData  =  input ('Enter  mean  for  each  feature  for  all  classes  (Y/N)  : 
if  (userData  ==  'Y') 
for  k  =  l:num_class 

getData  =  input ([' Enter  mean  for  class ' ,num2str (k) . 

'  features  (enter  as  column  vector):  ']); 

[rowData, colData]  =  size (getData) ; 
if (rowData*colData  ~=  num_features) 
disp ( ' *  *  *  DATA  ENTRY  ERROR  *  *  *  ' ) 
else 

if  (colData  ~=1) 

getData  =  reshape (getData, rowData*colData, 1) ; 

end 

end 

class_mean ( : ,  k)  =  getData; 

end 

else 

class_mean  =  2*rand(num_features,num_class)  -  1; 

end 

disp('  ') 

userData  =  input ('Enter  variance  for  each  feature  for  all  classes  (Y/N) 
if  (userData  ==  'Y') 
for  k  =  l:num_class 

getData  =  input ([' Enter  variance  for  class' ,num2str (k) , .. . 

'  features  (enter  as  column  vector):  ']); 

[rowData, colData]  =  size (getData) ; 
if (rowData*colData  ~=  num_features) 
disp ( ' ***  DATA  ENTRY  ERROR  *** ' ) 
else 

if  (colData  ~=1) 

getData  =  reshape (getData, rowData ^colData, 1) ; 

end 

end 

class_var  ( : ,  k)  =  getData; 

end 

else 

%  Randomly  DETERMINE  variance  and  ADD  white  noise 
snr  =  [ ] ; 
class_var  =  [] ? 

usrSNR  =  input ( 'Add  noise  (Y/N) :  ' , ' s' ) ; 

if  (usrSNR  ==  'Y') 

snr  =  input ('Enter  feature  SNR  (default=OdB) :  '); 

if  (isempty (snr) ) 
snr  =  0 ; 

end 

snrConst  =  10A(snr/10); 

for  k  =  l:num_class 
cont  =  1; 
classVar  =  []; 

varPower  =  num_features/ snrConst ; 
while (cont) 

classVar  =  rand(num_features-l , 1) /snrConst; 
lastVar  =  varPower  -  sum  (classVar) ? 
if  (lastVar  >=  0) 

classVar  =  [classVar'  lastVar] ' ; 
cont  =  0; 

end 

end 

class_var  =  [class^var  classVar]; 

end 

else 

class_var  =  zeros (num_features/num_class) ; 

end 

end 
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%  NOTE:  with  class_mean  and  class_var,  construct  data  then  covariance  matrix 
return  _ 


b.  genclass_compare.m 


function  [dif class, class_cov]  =  genclass_compare  (numData, class_mean, class_var)  ; 


% 

% 

% 

% 

% 

% 

% 

% 

% 

% 

% 

% 

% 

% 

% 

% 

% 

% 

% 

% 

% 

% 

% 


Function 

-  Randomly  GENERATES  'numData'  training  realizations  of  'nuiruclass'  classes  (note: 
num_class  plotting  limited  to  <=  5) . 

-  CALCULATES  covariance  matrix  of  data  for  statistical  analysis 

-  PRE-CONDITIONS  class  data  for  use  by  Method2  by  normalizing  data  by  standard 
deviation,  resulting  in  "testcl#"  data  (normalized  data  vice  normalized 
projections) . 

-  GENERATES  10* 'numData'  test  realizations. 

Use:  [classdata, class_cov]  =  genclass_compare (numData, num_class, class_mean, class_var) ; 

Input  numData:  number  of  training  signals  to  construct 

class_mean:  'num_class'  'num_features 'xl  vectors  of  class  feature  means 
class_var:  'num_class'  'num_features 'xl  vectors  of  class  feature  variances 

Returns  difclass:  generated  training  data  points 

class_cov:  'num_class'  'num_features ' x ' num_f eatures '  covariance  matrix 

Saves  at  directory  test/,  testing  realizations 

14  January  2000 
Miguel  G.  San  Pedro 

**************************************************************************************** 


plot_char  =  [ 'b* ' ? ' r+ ' ; 'go ' ; ' cs ' ?  'md' 3 ; 
class_cov  =  [3; 
difclass  =  []; 


%'  TRAINING  REALIZATIONS 
figure (1 ) 
orient  tall 

[nuirufeatures,  num_class]  =  size  (class_mean)  ; 

%  GENERATE  numData  training  realizations 
for  m  =  l:num_class 

classData  =  sqrt (class_var ( : , [m*ones (1, numData) ] ) ) . *randn (num_features, numData) . . . 

+  class_mean  ( : ,  [m* ones  (1,  numData)  ])  ,* 
class_cov  =  [class_cov- cov(classData' ) ] ; 
difclass  =  [difclass/classData] ; 

%  PLOT  first  three  features  of  each  class 
subplot (211 ) 

plot3 (classData (1 , : ) , classData (2 , :) ,classData{3,  : )  ,plot_char (m, : ) ) 
hold  on 

xlabel ( 'First  Feature' ) ; 
ylabel ( ' Second  Feature ' ) ; 
zlabel ( 'Third  Feature'); 
title ( 'Training  Data') 
box  on 
grid  on 

subplot (234) 

plot (classData (1 , : ) , classData (2 , : ) ,plot_char (m, : ) ) 
hold  on 

xlabel ( 'First  Feature' ) ; 
ylabel ( ' Second  Feature ' ) ; 
grid  on 


155 


subplot (235) 

plot (classData(l, : ) , classData(3, : ) ,plot_char (m, : ) ) 
hold  on 

xlabel ( 'First  Feature'); 
ylabel ( 'Third  Feature' } ; 
grid  on 

subplot (236) 

plot (classData(2, : ) , classData{3 , : ) ,plot_char (m,  : ) ) 
hold  on 

xlabel ( ' Second  Feature ' ) ; 
ylabel ( 'Third  Feature' ) ; 
grid  on 

end 

hold  off 

%******************************************************************************** 
%  GENERATE 

%  -  numData*10  test  realizations  of  each  classes 

%  -  test_data  realizations  of  random  noise  that  should  not  type  to  any  classes 

test_data  =  numData*10? 
testClass  =  [ ] ; 

for  k  =  l:num_class 
cl_SD  =  [ ] ; 

cl_SD  =  sqrt (class_var { : , [k*ones (1, test_data) ] ) ) ; 
cl_Mean  =  [  ] 

cl_Mean  =  class_mean ( : , [k*ones (1 , test_data) ] ) ; 

trainData  =  cl_SD. *randn (num_features,  test_data)  +  cl_Mean; 
testClass  =  [testClass; trainData] ? 


%  GENERATE  non-class  data  for  testing 
nonClassData  =  10*randn (num__features, test_data)  -  5; 
testClass  =  [testClass;nonClassData] ? 
save  test\testClass.dat  testClass  -ascii  -tabs 


3.  Modulated  Signal  Simulation  and  Feature  Extraction 
a.  genSignaLm 


function  [ featuresSave,meanSig, covSig, varSig] . . . 

=  genSignal (fxn,num_signals, A,T, f ,n/ features, snr) 

%  Function 

%  -  GENERATES  training  and  testing  signals 

% 

%  Use:  [featuresSave/meanSig, covSig, varSig] 


%  Input  fxn : 
% 


nuirt_signals: 


features : 


genSignal  ( fxn, num__signals ,  A,  T,  f , n,  features ,  snr) 

string  name  of  signal  type  to  construct 
('2-ASK',  ' 2-PSK' ,  or  '2-FSK') 

number  of  training  signals  to  construct;  constructs 

10*num_signals  testing  signals 

signal  amplitude 

signal  period 

carrier  frequency 

time  sample  vector 

distinguishing  features  indices  (from  detFeatures .m 
signal  SNR 


%  Returns  featuresSave: 


distinguishing  features  extracted  for  classifying 


%  meanSig:  mean  of  extracted  features 

%  covSig:  covariance  matrix  of  extracted  features 

%  varSig:  variance  of  extracted  features 

% 

%  31  January  2000 
%  Miguel  G.  San  Pedro 

%***** ***************************************  ***************************  ***************** 

%  GENERATE  training  signals 
featuresSave  =  []; 
for  k  =  1 :num_signals 

[signal, featuresSignal]  =  feval (fxn,Af T, f,n, features, snr) ; 
featuresSave  =  [featuresSave  featuresSignal]; 

end 

meanSig  =  mean ( featuresSave, 2) ; 
covSig  =  cov { featuresSave ') ; 

[covSigRoW/ covSigCol]  =  size (covSig) ; 
for  k  =  1: covSigRow 

for  kk  =  l:covSigCol 

if  {-covSig (k, kk) )  %  element  is  zero 

covSig (k,kk)  =  le-10; 

end 

end 

end 

varSig  =  diag (covSig) ; 

%goon  =  input ( ' continue  '  / '  s ' ) ? 

%if  goon  ==  'y' 

%  varSig 
%  meanSig 
%end 

%  GENERATE  testing  signals 
load  testClass.dat 
testClassSave  =  [ ] ; 
for  k  =  1 :10*num_signals 

[signal, testSignal]  =  feval ( fxn, A, T, f,n, features, snr) ? 
testClassSave  =  [testClassSave  testSignal]; 

end 

testClass  =  [testClass; testClassSave] ; 

save  test\testClass.dat  testClass  -ascii  -tabs 

return  _ _ _ _ 


b.  gen2ASK.m,  gen2PSK.m,  gen2FSK.m 

function  [signal , features2 ASK]  =  gen2ASK(A,T, fc,n, features, snr) 


%*********************************************** *****************************  ************ 
%  Function 

%  -  GENERATES  a  2 ASK  signal 

% 

%  Use:  [signal, features2ASK]  =  gen2ASK(A,T, fc,n/ features, snr) 


% 

%  Input 
% 

% 

% 

% 

% 


A: 

T: 

fc: 

n: 

features: 

snr: 


signal  amplitude 
bit  period 
carrier  frequency 
time  sample  vector 

distinguishing  features  indices  ( from  detFeatures .m) 
signal  SNR 


% 

%  Returns  signal : 

% 

%  features2ASK: 

% 

%  21  January  2000 
%  Miguel  G.  San  Pedro 


postive  frequencies  of  Fourier  transformed  2 -ASK  signal 
realization 

distinguishing  features  spectral  magnitudes 
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%  GENERATE  message 
a  =  zeros (1,20) ; 
while  (sum (a)  ==  0) 

a  =  round { rand ( 1 , 20) )  ; 

end 

basis  =  A/ sqrt (T) *sin (2*pi*fc*n) ;  %  SET  basis  function 

msg  =  [] ; 

for  kk  =  1: length (a) 

msg  =  [msg  a (kk) *basis] ; 
end 

[msgRow,msgCol]  =  size  (msg); 
v  =  reshape (msg, 1 ,msgRow*msgCol) ; 

%  ADD  white  noise 
if  ( (nargin  >=5)  &  (snr  9999)) 
energyV  =  v*v'; 

varNoise  =  (energyV/ length (n) ) /10" (snr/10) ; 
noise  =  sqrt (varNoise) *randn (size (v) ) ; 
v  =  v  +  noise; 
end 

%  NORMALIZE  the  signal  power 
den  =  v*v' ; 
v  =  v/sqrt (den) ; 

%  PRE-PROCESS  signal 

%  -  use  decision  rule  to  extract  points 
[sigRow, sigCol]  =  size(v); 

iter  =  floor (sigCol/250) ?  %  discard  leftover  points 

aveSig  =  zeros (1 , 1000 ) ; 
for  k  =  Is  iter 
%  FFT  signal 

block  =  v(l,250*k-249:250*k) ; 

*  sigFFT  =  abs (f ft (block, 1000) ) ; 
aveSig  =  aveSig  +  sigFFT; 

end 

signal  =  aveSig (1 : length (aveSig) /2) /iter; 

f eatures2ASK  =  [ ] ; 
if  (nargin  >=  5) 

features2ASK  =  signal (features) ‘ ; 

end 

return 


function  [ signal , features2PSK]  =  gen2PSK ( A, T, fc,n, features, snr) 


%  Function 

%  -  GENERATES  a  2PSK  signal 

% 


%  Use:  [signal , features2PSK] 
% 

%  Input  A: 


% 

% 

% 

% 

% 

% 

%  Returns 
% 

% 


T: 

fc: 

n: 

features : 
snr: 

signal : 

features2PSK: 


=  gen2PSK ( A, T,  fc,n,  features, snr) 

signal  amplitude 
bit  period 
carrier  frequency 
time  sample  vector 

distinguishing  features  indices  (from  detFeatures.m) 
signal  SNR 

postive  frequencies  of  Fourier  transformed  2-PSK  signal 
realization 

distinguishing  features  spectral  magnitudes 
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% 

%  21  January  2000 
%  Miguel  G.  San  Pedro 


%  GENERATE  message 
a  =  2 * round ( rand {1, 20) )  -  1; 

basis  =  A*sqrt (2/T) *sin (2*pi*fc*n) ;  %  SET  basis  function 


msg  =  []  ; 

for  kk  =  1: length (a) 

msg  =  [msg  a (kk) *basis] ; 
end 

[msgRow,msgCol]  =  size(msg); 
msg  =  reshape  {msg  / 1  ,msgRow*msgCol )  ; 


v  =  msg; 


%  ADD  white  noise 
if  { (nargin  >=5)  &  (snr  ~=  9999)) 
energyV  =  v*v'; 

varNoise  =  (energyV/length(n) ) /lO* (snr/10) ; 
noise  =  sqrt (varNoise) *randn (size (v) ) ; 
v  =  v  +  noise; 
end 

%  NORMALIZE  the  signal  power 
v  =  v/sqrt (v*v' ) ; 

%  PRE-PROCESS  signal 

%  -  use  decision  rule  to  extract  points 
[sigRow, sigCol]  =  size(v); 

iter  =  floor (sigCol/250) ;  %  discard  leftover  points 

aveSig  =  zeros (1 , 1000) ; 
for  k  =  l:iter 
%  FFT  signal 

block  =  v(l/250*k- 249:250 *k) ; 
sigFFT  =  abs (f ft (block/ 1000) ) ; 
aveSig  =  aveSig  +  sigFFT; 

end 

signal  =  aveSig (1 : length (aveSig) /2) /iter; 

features2PSK  =  []; 
if  (nargin  >=  5) 

features2PSK  =  signal (features) ' ; 

end 


return 


function  [signal, features2FSK]  =  gen2FSK (A,T/ fC/n, features, snr) 


*************************************************************************************** 
%  Function 

%  -  GENERATES  a  2FSK  signal 

% 

%  Use:  [signal, features2FSK]  =  gen2FSK(A,T,fc,n, features, snr) 

% 


%  Input 
% 

% 

% 

% 

% 

% 


A: 

T: 

fC: 

n: 

features : 
snr: 


signal  amplitude 
bit  period 
carrier  frequency 
time  sample  vector 

distinguishing  features  indices  (from  detFeatures .m) 
signal  SNR 


%  Returns  signal : 
% 


postive  frequencies  of  Fourier  transformed  2-FSK  signal 
realization 
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%  features2FSK:  distinguishing  features  spectral  magnitudes 

% 

%  21  January  2000 
%  Miguel  G.  San  Pedro 

^t*************************************************************************************** 
delf  =  1/T; 

%  GENERATE  message 
a  =  round (rand (1, 20) )  ; 

basis  =  [ 3  ? 
for  kk  =  1: length (a) 
if  (a (kk)  ==  1) 

basis  =  [basis  sqrt (2/T) *sin (2*pi*fc*n) ] ; 
else 

basis  =  [basis  sqrt (2/T) *sin (2*pi* (fc+delf) *n) ] ; 

end 

end 

msg  =  basis; 

[msgRoW/msgCol]  =  size(msg); 

msg  =  reshape  (msg,  l,msgRow*msgCol)  ; 

v  =  A*msg; 

%  ADD  white  noise 
if  { (nargin  >=5)  &  (snr  ~=  9999)) 
energyV  =  v*v' ; 

varNoise  =  (energyV/ length (n) ) /10A (snr/10)  ; 
noise  =  sqrt (varNoise) *randn (size (v) ) ; 
v  =  v  +  noise; 
end 

%  NORMALIZE  the  signal  power 
v  =  v/sqrt  (v*v' )  ; 

%  PRE-PROCESS  signal 

%  -  use  decision  rule  to  extract  points 
[sigRow, sigCol]  =  size(v); 

iter  =  floor (sigCol/250) ?  %  discard  leftover  points 

aveSig  =  zeros (1, 1000) ; 
for  k  =  1 : iter 
%  FFT  signal 

block  =  v(l,250*k-249:250*k) ; 
sigFFT  =  abs (f ft (block,  1000) )  ; 
aveSig  =  aveSig  +  sigFFT; 

end 

signal  =  aveSig (1 : length { aveSig) /2) /iter; 

f eatures2FSK  =  [ ) ; 
if  (nargin  >=  5) 

features2FSK  =  signal ( features) ' ; 

end 

return 


c.  detFeatures.m,  extractF eatures.m 

function  [features]  =  detFeatures 

%**************************************************************************************** 
%  Function 

%  -  EXTRACTS  feature  indices  to  be  used  for  signal  classification 

% 

%  Use:  [featuresLoc]  =  extractFeatures (sigType, signal) 

% 

%  Input  (none) 
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% 

%  Returns  features:  signal  component  indices  for  signal  classification 
% 

%  21  January  2000 
%  Miguel  G.  San  Pedro 

^* *************************************************************************************** 
clear 

A  =  4;  %  SET  signal  amplitude 

T  =  le-6 ;  %  SET  bit  interval  of  signal  (sec) 

fs  =  5e7 ;  %  SET  bit  sampling  frequency  {samples /sec) 

fc  =  5e6 ;  %  SET  carrier  frequency  (Hz) 

n  =  linspace (0,T, fs*T) ; 

features  =  [ ] ; 

%  DETERMINE  classl  features:  2-ASK 
f eaturesSave  =  [ 3 ; 
for  k  =  1:1000 

[ASK, temp]  =  gen2ASK (A,T, fc,n) ; 
featuresLoc  =  extractFeatures ( 72ASK7 ,ASK) 
if  (k  ~=  1) 

f eaturesSave  =  intersect { f eaturesSave, featuresLoc) ; 
else 

f eaturesSave  =  featuresLoc; 

end 

end 

features2ASK  =  f eaturesSave; 
disp(size{features2ASK) ) 

features  =  union { features , f eatures2ASK) ; 

%  DETERMINE  class2  features:  2-PSK 
f eaturesSave  =  [] ; 
for  k  =  1:1000 

[PSK,temp]  =  gen2PSK(A,T, fc,n) ; 
featuresLoc  =  extractFeatures ( 7 2PSK7 , PSK) ? 
if  {k  ~=  1) 

f eaturesSave  =  intersect (f eaturesSave, featuresLoc) ; 
else 

f eaturesSave  =  featuresLoc; 

end 

end 

features2PSK  =  featuresSave; 
disp{size (features2PSK) ) 

features  =  union { features, features2 PSK) ; 

%  DETERMINE  class3  features:  2-FSK 
featuresSave  =  [ ] ; 
for  k  =  1:1000 

[FSK,temp]  =  gen2FSK (A, T, fc , n) ; 
featuresLoc  =  extractFeatures  { 7  2FSK7 ,  FSK)  ,- 
if  (k  1) 

featuresSave  =  intersect ( featuresSave, featuresLoc) ; 
else 

featuresSave  =  featuresLoc; 

end 

end 

features2FSK  =  featuresSave; 
disp ( size ( f eatures2FSK) ) 

features  =  union ( features, features2FSK) ; 

return 
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4.  Data  Conditioning  and  Display 


_ a.  dataMethod2.m 

function  [classData_norm]  =  dataMethod2 (classData, class_mean, class_var) 

%************************************  +  ic*******-k***  +  **i'****i'******i'**ic*********i'*****ic**** 

%  Function 

%  -  NORMALIZES  training  and  testing  data  by  class  standard  deviation  for  use  in  Method2 

% 

%  Use:  [classData_norm]  =  dataMethod2 {classData, class_mean, class__var) 


%  Input  classData: 

%  class_mean: 

%  class_var: 


generated  training  data 

' num_class'  ' num_features  'xl  vectors  of  class  feature  means 
'num^class'  'num_features ' xl  vectors  of  class  feature 
variances 


%  Returns  classData_norm:  normalized  training  data 

% 

%  Saves  at  directory  test/ ,  normalized  testing  realizations 
% 

%  14  January  2000 
%  Miguel  G.  San  Pedro 

%*********************************  *  *********************  ***********  *****  ********  **.* 


classData_norm  =  []; 

[num_features, num_class3  =  size(class_mean); 

[rowData,num_data]  =  size (classData) ; 

%  NORMALIZE  training  data  by  standard  deviation  (Method2) 
if  (num_features*num_class  ~=  rowData) 
disp ( 'Note:  INPUT  ERROR') 
else 

for  k  =  l:num_class 

knum_f  eat  =  k*num_f  eatures  ; 

data  =  classData (knum_f eat  -  num_features  +  1 :knum_feat, : ) ; 
data_adj  =  (data  -  class_mean( : , [k*ones (l,num_ data) ]))... 

. /sqrt (class_var ( : , (k*ones (l,num_data) ]))... 

+  class_mean ( : , [k*ones (1 ,num_data) ] ) ; 
classData_norm  =  [classData_norm; data_adj ] ; 

end 

end 
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%  NORMALIZE  testing  data  by  standard  deviation  (Method2) 
testClass_norm  =  [  ]  ; 
load  test\testClass.dat 
[rowData,num_test]  =  size (testClass) ; 

if  (num_f  eatures* (num_class+l)  ~=  rowData) 
disp( 'Note:  INPUT  ERROR' ) 

else 

for  k  =  1 :num_class+l 

knum_f eat  =  k*num_f  eatures  ; 

data  =  testClass  (knum_f eat  -  num__f  eatures  +  1 :knum_feat, : ) ; 
data_adj_save  =  []; 
for  kk  =  l:num_class 

data_adj  =  (data  -  class_mean ( : ,  [kk*ones  (l,numj:est)  3 ) )  -  . . 

.  /sqrt  (class_var  ( : ,  [kk*ones  (l,num_test)  ]))... 
+  class_mean( : ,  [kk*ones (l,num_test)  3  )  ; 
data_ad j  _save  =  [  dat a_ad j  _save ;  data_ad j  ]  ; 

end 

testClass_norm  =  [testClass_norm  data_adj_save]  ? 

end 

end 

save  test\testClass_norm.dat  testClass_norm  -ascii  -tabs 
return  . 


b.  plotMS.m,  err  surf _sp.m 


function  plotMS(num_class,num_f eatures , classData, classData_norm) 

%**************************************************************************************** 
%  Function 

%  PLOTS  projection  of  test  data  using  weights  and  bias  determined  by  the  mean 
%  separator  neural  network 
% 

%  Use:  plotMS (num_class,num_f eatures, classData, classData_norm) 

% 

number  of  signal  classes 
number  of  distinguishing  features 
class  data  training  set 

class  data  training  set  (normalized  -  Method2) 


num_class: 
num_f eatures : 
classData: 
classData: 


can  plot  only  1  feature  classes 


%  Input 
% 

% 

% 

% 

%  Limitations: 

% 

%  Returns  (none) 

% 

%  12  January  2000 
%  Miguel  G.  San  Pedro 

%************** ***************************************************************** 
global  gloUsrReq 

wl  =  []; 

if  (gloUsrReq  ==  'N' ) 

userReq  -  input ('Plot  Mean  Separator  and  Error  surface  and  contours  (Y/N) : 
if  (userReq  ==  ' Y ' ) 

f  =  [  ’  meansep__spl  * ;  'meansep_sp2  '  ;  'meansep_sp3  ’ ;  'meansep_sp5  ’  ]  ; 
wl  =  inputCEnter  weight/bias  range  (default  -100:100):  ’); 

bl  =  Wl; 

if  (isempty (wl) ) 

wl  =  [-50: .25:50] ; 
bl  =  wl; 
end 

for  k  =  1:4 

for  m  ss  l:num_class 

mnum_feat  =  m*num_f eatures  ; 


k  /  ’  s ' )  ? 
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for  mm  =  m+1  :num_ class 

mmnum_f  eat  =  mm*num_features; 
if  <k  ~=  2) 

ell  =  classData(mnum_feat  -  num_features  +  1  :mnum_feat ,  : )  ? 
cl2  =  classData (mmnum_f eat  -  num_features  +  1  :mmnum_feat,  : )  ; 
p  =  [ell; cl2] ; 
else 

ell  =  classData_norm (mnum_f  eat  -  num_features  +  1  :mnum_feat,  : )  ; 
cl2  =  classData_norm(mmnum_feat  -  num_features  +  1  :mrnnum_feat, : )  ? 
p  =  [ell ; cl2 ] ; 
end 

errsurf_sp  (p,  wl,bl,  f  (k,  : ) )  ? 

end 

end 

end 

end 

end 

return 


function  m  =  errsurf_sp(p,wv,bv, f ) 


%*★★★** *********************************************************** *********************** 
%  Function 

%  PLOTS  the  error  surface  and  error  contours  of  a  mean  seperator  neural  network  over  a 
%  range  of  weights  and  biases 
% 

%  Use  m  =  errmesh_sp  (p,  wv,bv,  f  > 

% 


%  Input 
% 

% 

% 

% 

% 

%  Returns 
% 


%  Example 

%  p  = 

% 

%  wv 

%  bv 

%  es 

% 


p:  2xQ  matrix  of  input  vectors.  First  row  -  feature  of  class  1;  second  row  - 
feature  of  class  2  in  second  row 
wv:  column  vector  of  weights 
bv:  column  vector  biases 

f:  transfer  function  (optional,  default  -  meansep_sp5) 
m:  matrix  of  error  values  over  wv  and  bv. 


[-6.0  -6.1  -4.1  -4.0  +4.0  +4.1  +6.0  +6.1; 
+0.0  +0.0  +.97  +.99  +.01  +.03  +1.0  +1.03? 
( -1 : .1:1)  '  ; 

(-2.5:  .25:2.5)  1 ; 

errmesh__sp  (p,  wv,bv,  'meansep_sp5  ' ) ; 


%  5  January  2000 
%  Miguel  G.  San  Pedro 


if  nargin  <  3/ error (* Not  enough  input  arguments .') ?end 
if  (nargin  ==  3 ) 

f  =  'meansep_sp5 ’ ; 

end 

[pRow/pCol ]  =  size (p) ; 

pl  -  p(l, :) ; 

p2  =  p (2, : ) ; 

if  (f  ==  'meansep_spl 1 ) 
t  =  -400; 

end 

if  (f  means ep_sp2 1 )  %  for  meansep_sp2,  refer  to  notes  in  meansep_sp2  function 

%  code 

t  =  -400; 

f  =  ' means ep_spl 1 ; 

end 

if  (f  ==  'ineansep_sp5 '  )  %  for  MSNN  norm  proj  var,  no  identifiable  optimum  value. 
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%  Algorithm  is  such  that  want  to  increase  mean  spread  and 
%  decrease  sum  of  variance.  Result  wanted  is  large 
%  magnitude  for  value  of  performance  parameter.  Therefore, 
%  set  t=0  ==>  error  plot  and  performance  plot  are  the  same. 

t  =  0; 

end 

m  =  zeros {length (bv) , length (wv) ) ; 
for  k  =  1: length (wv) 

for  kk  =  1: length (bv) 

pp (kk, k)  =  feval (f ,pl,p2,wv(k) ,bv(kk) ) ; 
if  (f  ==  /meansep_sp3 ' ) 
if  (pp(kk,k)  <=  400) 
t  =  0; 
else 

t  =  1600? 

end 

end 

m(kk,k)  =  (t  -  pp (kk, k) ) *2;  %  squared  error  calculation 

end 

end 

%  PLOT  performance  parameter  suface  and  contours 
figure 

orient  landscape 
subplot (221) 
grid 

mesh(bv,wv,pp) 

xlabel ( 'bias7 ) 

ylabel ( 'weight ' ) 

z label ( 'Mean  Separator' ) 

title ([ 'Performance  Parameter  Surface  (',f,')']) 

subplot (222) 
grid 

contour  (bv,  wv,pp,  10) 
xlabel ( 'bias ' ) 
ylabel ( 'weight ' ) 

title ([ 'Performance  Parameter  Contours  (',f,')']) 

%  PLOT  error  surface  and  contours 

subplot (223 ) 

grid 

mesh  (bv,wv,m) 
xlabel { 'bias' ) 
ylabel { 'weight ' ) 
zlabel { 'error' ) 

title ([ 'Error  Surface  (',f,')']) 

subplot (224 ) 
grid 

contour (bv,wv,m,  10) 
xlabel ( 'bias' ) 
ylabel ( 'weight' ) 

title ( [ ' Error  Contours  ( ' , f , ' ) ' ] ) 

return _ _ 

c.  dispProjection.m,  plotProjection.m,  dispWeightBias.m 

function  dispProject ion (o, r,numTestPts, method) 

%* *************************************************************************************** 
%  Function 

%  DISPLAYS  the  projection  of  test  data  using  weights  and  bias  determined  by  the  mean 
%  separator  neural  network 
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% 

%  Use:  dispProjection (o, r, numTestPts, method) 
% 


%  Input 
% 

% 

% 

% 


o:  matrix  of  all  test  data  projection 

r:  matrix  of  class  identification  projection 

numTestPts:  number  of  test  data  points 

method:  method  number 


%  Returns  (none) 
% 


%  12  January  2000 
%  Miguel  G.  San  Pedro 


%  DISPLAY  class  type  identifiers  and  testing  data  projection  (considers  each  class 
%  separately) 


[n, all_classes]  =  size{o); 

num_class  =  all_classes /numTestPts  -  1;  %  -1  so  do  not  count  noise  block  as  a 

%  distinct  class 

for  k  =  l:num_class 

knumTestPts  =  k*numTestPts; 

data  =  o (:, knumTestPts  -  numTestPts  +  1 : knumTestPts ) ; 

disp( [ 'r' /num2str(k) , '  =  7 , num2str (r ( : , k) ' ) ] ) 
disp ( [ 'o' /num2str (k) , '  =  ']) 
disp(num2str  (data' ) ) 
disp ( '  ') 

end 

return 


function  plotProjection (o, r, numTestPts, method, fig) 

%  Function 

%  PLOTS  projection  of  test  data  using  weights  and  bias  determined  by  the  mean 
%  separator  neural  network 
% 

%  Use:  plotProjection (o, r, numTestPts, method, fig) 

% 

%  Input  o:  matrix  of  all  test  data  projection 

%  r*  matrix  of  class  identification  projection 

%  numTestPts :  number  of  test  data  points 

%  method:  method  number 

%  fig:  figure  number 

% 

%  Limitations:  -  o  and  r  can  only  contain  3  rows  of  data 
%  -  only  5  classes  can  be  plotted 

% 

%  Returns  (none) 

% 

%  12  January  2000 
%  Miguel  G.  San  Pedro 

%************************************************************************************** 
[n, all_classes]  =  size(o); 

num_class  =  all_classes/numTestPts  -  1;  %  -1  to  discount  noise  block  as  a  distinct 

%  class 

%  limit  number  of  classes  to  plot  to  5 
if  (num_class  >  5) 
num_class  =  5; 
end 

plot_char  =  [ 'b* ' ; 'r+' ; 'go' ; 'cs' ;  'md' ] ? 


figure (fig) 


orient  tall 

for  k  =  l:num_class 

%  considers  each  class  separately 
knumTestPts  =  k*numTestPts; 

data  =  o (: ,knumTestPts  -  numTestPts  +  1 : knumTestPts) ; 
subplot (211) 

plot3  (datad  / 1 :5  :length{data) )  ,data(2/ 1 : 5 :  length  (data) )  ,  data  (3 , 1 : 5 :  length  (data) )  , 
plot_char (k, : ) ) 
hold  on 

plot3 (r (1/k) / r (2 ,k) ,r (3/k) /plot_char (k/ : ) ) 
subplot (234) 

plot ( data (1/1:5: length ( data) ) / data (2 , 1 : 5 : length (data) ) /plot_char (k, : ) ) 
hold  on 

plot (r (l,k) , r (2 , k) /plot_char (k, : ) ) 
subplot (235) 

plot (data (2/ 1:5: length (data) ) ,data(3, 1:5 :length( data) ) /plot_char (k/ : ) ) 
hold  on 

plot (r (2/k) , r (3 , k) /plot_char (k, : ) ) 
subplot (236) 

plot (data (1 , 1:5: length (data) ) / data (3 , 1:5: length (data) ) / 'b* ' ) 
hold  on 

-  plot (r (1 , k) ,r (3/k) ,plot_char (k, : ) ) 
end 

subplot (211) 

title (['Test  Data  Projection  (Method' /num2str (method) ,')'] ) 

xlabel ( ' feature  1 ' ) 

ylabel (' feature  2') 

zlabel (' feature  3') 

box  on 

grid  on 

hold  off 

subplot (234) 
grid  on 

xlabel ( ' feature  1 ' ) 
ylabel { ' feature  2 ' ) 
hold  off 

subplot (235) 
grid  on 

xlabel ( ' feature  2 ' ) 
ylabel ( ' feature  3 ' ) 
hold  off 

subplot (236) 
grid  on 

xlabel { ' feature  1 ' ) 
ylabel ( ' feature  3 ' ) 
hold  off 

return 


function  dispWeightBias (w,b) 

%* *************************************************************************************** 
%  Function 

%  DISPLAYS  weights  and  biases  determined  during  training  phase 
% 

%  Use:  dispWeightBias (w,b) 

% 

%  Input  w:  projection  weight  vector 
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%  b:  projection  bias 

% 

%  Returns  (none) 

% 

%  27  December  1999 
%  Miguel  G.  San  Pedro 

%**************************************************************************************** 

[num_prwise,num_class]  -  size{w); 

%  DISPLAY  weights /bias  and  class  type  identifiers 
for  k  =  1 :num_class 

disp  ( [ '  wNN'  /  num2str  (k) , '  =  [ '  #num2str  (w( : ,  k)  ' ) ,  '  ]  bNN'  ,num2str  (k)  ,  '  = 
num2str  (b(k) )  ] )  ; 
disp('  ' ) 

end 

return 


B.  CLASSIFICATION  METHODS 


This  section  contains  the  programs  used  to  determine  the  classification  capability 
of  the  specific  signal  typing  methods. 

1.  Statistical  Classifier 


a.  statClassifier.m 


function  statClassif ier  (num_data/num_class/n\im_features/  . .  - 

class_mean/ class_cov) 


%**************************************************************************************** 
%  Function 

%  USES  quadratic  classifier  to  type  classes 
% 

%  Use:  statClassif  ier  (num_data,num_class,nuin_features,  class_rnean,  class_cov) 

% 


%  Input 
% 

% 

% 

% 

% 


num_data : 
num_class : 
num_f eatures : 
class_mean: 
class_cov: 


number  of  training  realizations 
number  of  signal  classes 
number  of  distinguishing  features 
feature  mean  values 
feature  covariance  matrix 


%  Returns  (none) 

% 

%  7  March  2000 
%  Miguel  G.  San  Pedro 


%  LOAD  test  points 
load  test\testClass.dat 

[testRow, testData]  =  size (testClass) ; 
if  (10*num__data  ~=  testData) 
di sp ( ' *  *  *  DATA  ERROR  ***') 

end 

%  SET  class  a  priori  probabilities  for  equiprobably  classes 
P  =  l/num_class; 

%  LOAD  stat  classifier  confusion  matrix 
load  typeStat.dat 
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data  =  [ ] ; 
tempMat  =  [  3  ; 
for  k  =  l:num_class 

knum_feat  =  k*num_features; 

data  =  testClass (knum_feat  -  num_features  +  1 :knum_feat , : ) ; 

distMat  =  []; 

for  kk  =  l:num_class 

kknum_feat  -  kk*num„f eatures; 

dist  =  classDist {data, P, class_mean { : , kk) ,  . . . 

class_cov( :  ,kknuin_feat  -  num_f eatures  +  1 :kknum_feat) ) ; 
distMat  =  [distMat; dist] ; 

end 

type  =  zeros (1, num_class) ; 
for  kk  =  l:testData' 

[y  index]  =  min (distMat (:, kk) ,  [] , 1) ; 
type (index)  =  type (index)  +  1? 

end 

disp ( [ 'TYPE' /num2str (k) , ' :  ' ,num2str (type) ] ) 

[Statrow/ Stated]  =  size ( typeStat)  ; 
tempStat  =  typeStat (Statrow- (num_class-k) 
tempStat  =  tempStat  +  type; 
tempMat  =  [tempMat;  tempStat]  ; 

end 

typeStat  =  [typeStat; tempMat ] ; 

save  typeStat.dat  typeStat  -ascii  -tabs 

return 


b.  ClassDisLm 


function  [dist]  =  classDist (data, classProb, classMean,classCov) 


%*******★*★****************★************★**************************★******★******★★****** 
%  Function 

%  -  DETERMINES  classification  distance  for  test  data  wrt  to  a  particular  class' 

%  statistics  (as  discussed  by  Brunzell/ Eriksson) 

%  -  distance  parameter  given  by 

%  di (x)  =  ln(det (classCov) )  -  2*lnP  +  (x-classMean) # *inv(classCov) * (x-classMean) 

% 


%  Use:  [dist]  =  classDist (data, classProb, classMean, classCov) 
% 


%  Input 
% 

% 

% 

% 

%  Returns 
% 


data: 

classProb: 

classMean: 

classCov: 

dist: 


m- dimensional  test  data  to  be  typed  (m  rows) 
class  a  priori  probability 
mxl  vector  of  class  feature  mean  values 
mxm  covariance  matrix  for  class  features 

distance  for  each  test  data  point 


%  7  January  2000 
%  Miguel  G.  San  Pedro 


[dataRow, dataCol]  =  size (data); 
dist  =  [] ; 

cl  =  log (det (classCov) )  -  2*log (classProb) ; 
c2  =  inv ( classCov) ; 
for  k  =  1 : dataCol 

c3  =  data{:/k)  -  classMean; 
dist (k)  =  cl  +  c3'*c2*c3; 

end 
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return 


2.  Perceptron 

_ g.  percptrn  Classifier,  m _ 

function  percptmClassif ier  (num_class,  snr ,  classData, w,b) 

%******★********************************************************************************* 
%  Function 

%  USES  quadratic  classifier  to  type  classes 
% 

%  Use:  percptmClassif  ier  (num_class,  classData,  w,b) 


%  Input 


% 

% 

% 


num_class : 
snr: 

classData: 

W: 

b: 


number  of  signal  classes 
signal  snr 

class  data  training  set 
projection  weight  vector 
projection  bias 


%  Returns  (none) 
% 


%  15  January  2000 
%  Miguel  G.  San  Pedro 


[totFeatures,numData]  =  size (classData) ; 
numFeatures  =  totFeatures/num_class; 

%  TRAINING  PHASE 
%  ORGANIZE  input /target  vector 
P  =  [J; 
t  =  [3? 

target  =  detTargVect  (num_class)  ; 
for  k  =  l:num_class 

IcnumFeat  =  k*numFeatures; 

p  =  [p  classData  (knumFeat  -  numFeatures  +  1 :  knumFeat ,  : )  3  ; 
t  =  [t  target <:, [k* ones (l,numData) ])] ; 

end 

[numNeurons, tCol]  =  size(t); 
if  (tCol  ~=  num_class*numData) 
disp ( '  ***  DATA  ERROR' ) 

end 

net  =  newp  (minmax (p) , numNeurons,  'hardlim' ,  'leamp' )  ; 
w  =  w'; 

w  =  w  ( [ ones  ( 1 ,  numNeurons  )],:); 
net . iw{ 1,1}  =  w; 

net.b{l}  =  b( [ones (1, numNeurons) 3 , :) ; 
net. trainParam. epochs  =  2500; 
figure 

[net,tr]  =  train (net , p, t) ; 

disp ('Final  neuron  weights  and  bias') 
wNN  =  net . iw{l , 1} 
bNN  =  net.b{l} 

maxEpoch  =  max  ( tr .  epoch )  ; 
load  snrEpoch.dat 

snrEpoch  =  [snrEpoch;  snr  maxEpoch) ; 
save  snrEpoch.dat  snrEpoch  -ascii  -tabs 


load  ..\test\testClass.dat 
[testRow,numTestData3  =  size (testClass) ; 


if(testRow  ~=  numFeatures* (nunuclass+1) ) 
disp ( 7  ***  DATA  ERROR 7 ) 

end 

%  TESTING  PHASE 

%  REORGANIZE  testClass  to  place  blocks  of  class  test  data  in  a  row  vice  in  a  column 

pTest  =  [ ] ; 

for  k  =  1 :num_class+l 

knumFeat  =  k*numFeatures; 

pTest  =  [pTest  testClass (knumFeat  -  numFeatures  +  1 rknumFeat, : ) ] ; 
end 

tTest  =  sim (net, pTest) ; 

%  COUNT  results 

typel  =  zeros (num_class+l , num_class) ; 
noTypel  =  0 ; 

for  k  =  1 : (num_class+l) *numTestData 
typeRow  =  ceil  (k/nuinTestData)  ? 
index  =  bi2de (flipud(tTest ( : ,k) ) 7 ) ; 
if  ((index  ==  0) | (index  >  num_class) ) 
if  (typeRow  <=  num_class) 

noTypel  =  noTypel  +  1;  %  do  not  count  noType  if  random  test  data 

end 
else 

typel (typeRow, index)  =  typel (typeRow, index)  +  1; 

end 

end 

%  DISPLAY  test  data  class  typing 
for  m  =  1 :num_class+l 

disp ( [ 7 type7 ,num2str (m)  , 7 :  7 , num2str (typel (m,  : ) )  ,  7  7 ,num2str (numTestData) ] ) 

end 

disp ( [ 'no  type:  7 ,num2str (noTypel )} ) 
disp ( 7  7) 

load  type.dat 

type  =  type  +  typel; 

save  type.dat  type  -ascii  -tabs 

load  noType.dat 

noType  =  noType  +  noTypel? 

save  noType.dat  noType  -ascii  -tabs 

return 

_ b.  defrargVectm _ 

function  [target]  =  detTargVect (num_class) 

%**************************★********★**********************************★***************** 
%  Function 

%  DETERMINES  perceptron  target  vector 
% 

%  Use:  [target]  =  detTargVect (num_c lass) 

% 

%  Input  num_class:  number  of  signal  classes 
% 

%  Returns  target:  vector  of  unique  binary  class  representations 

% 

%  Example:  num_class  =  6; 

%  [target]  =  detTargVect (num_c lass) 

%  class  =  [123456] 

%  target  =  [000111; 

%  011001; 

%  10  10  10] 

% 
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%  15  January  2000 
%  Miguel  G.  San  Pedro 


class  =  [1 :num_class] ? 

[target]  =  flipud{de2bi (class) ') ; 


3.  Common  Mean  Separator  Programs 
a.  simmsnn.m 


function  simmsnn(f , method, classData, num_f eatures, w,b, tp, fig) 

%******************** ******************************************************************** 
%  Function 

%  SIMULATES  the  mean  separator  neural  network  with  performance  parameter  defined  by 
%  function  f 
% 

%  Use:  simmsnn(f  ,method,  classData, num_features,w,b,  tp,  fig) 


%  Input  f:  mean  separator  neural  network  function  method 

%  method:  mean  separator  variation  number 

%  1  -  standard 

%  2  -  preconditionied  input  (Mod  1) 

%  5  -  normalized  projection  (Mod  2) 

%  8  -  with  VMR  termination  (Mod  3) 

%  classData:  training  data 

%  w:  projection  weight  vector 

%  b:  projection  bias 

%  tp:  training  parameters  -(see  function  trms_sp) 

%  fig:  figure  number 

% 

%  Returns  (none) 

% 

%  6  March  2000 
%  Miguel  G.  San  Pedro 

%******************ic**********i(**-k*itic****ic***iciric***-)c*irir***itleifkir*****ific***ieic**ic**ic******** 

global  gloUsrReq 

[classRow,num_data]  =  size (classData) ; 
num_class  =  c las sRow/num_f eatures; 

num_j?rwise  =  sum(l :num__class-l) ;  %  number  of  pairwise  comparisons 

ind  =0;  %  pairwise  index 

r  =  zeros (num_prwise,num_class)  ?  %  class  type  identifier 

%**★**  ***************************************************  ***********************^  +  ^^^ 
%  COMPARE  class  k  and  class  kk 
for  k  =  l:num_class 

knum_feat  =  k*num_f eatures; 
for  kk  =  k+1 :num_class 

kknum_feat  =  kk*num_f eatures ; 
ind  =  ind  +  l; 

classl  =  classData (knum_feat~num_features+l :knum_f eat ,:) ; 
class2  =  classData (kknum_feat-num_features+l :kknum_f eat, :) ; 
pi  =  [classl ;class2 ] ; 

disp  (  [  ’  Class  1  ,num2str  (k)  ,  1  vs  Class  1  ,num2str  (kk)  ] ) 
fig  =  fig+1? 

[wNN( : ,  ind)  ,bNN(ind)  ]  =  feval  (f  ,w,b,pl,  tp, method,  fig)  ; 

%  DETERMINE  class  type  identifier  for  this  pairwise  conparison 
for  mm  =  l:num_class 

mmnum_f eat  -  mm*num_f eatures; 
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classA  =  classData(mmnum_feat-nuin_features+l  :mmnum_feat ,  : )  ; 
r(ind,mm)  =  20*mean(logsig(wNN( : ,  ind)  #  *classA  +  bNN(ind) ) )  -10; 

%  DETERMINE  projection  data  for  neuron  maps 

plotr  =  [plotr  20*logsig(wNN( : , ind) ' *classA  +  bNN(ind) ) -10] ; 

end 

%  PLOT  neuron  maps 
figure 
plot (plotr) 
xlabel('Test  Point') 

ylabel ( [ 'Neuron  Map  [ ' ,num2str (k) , ' , ' ,num2str (kk) ,']']) 
end 

end 

%  DISPLAY  weights /bias  and  class  type  identifiers 
if  (gloUsrReq  ==  'N') 

userReq  =  input ( 'Display  projection  weights  and  biases  (Y/N) : 
if  (userReq  ==  'Y') 

di spWe i ght Bi a s ( wNN  ,  bNN ) 

end 

disp('  ') 

end 

%************★*************************************************************************** 
%  CLASSIFY  test  points 
load  ..\test\testClass.dat 
[testRow, testData]  =  size (testClass) ; 
if  (testRow  num_features*  (num_class+l) ) 
disp ( '  ***  DATA  ERROR') 

end 

%  REORGANIZE  test  data  into  a  matrix  with  dimensions 

%  'num_features  'x'num_class'  *  'num_data' 

testCl  =  ( ] ; 

for  m  =  1  :num_class+l 

testCl  =  [testCl  testClass (  (m-1) *num_features+l :m*num_features, : ) ] ? 

end 

[testRow, totTestData]  =  size (testCl) ; 

if  ((testRow  ~=  num_features) | (totTestData  -=  (num_class+l) *testData) ) 
disp ( ' ***  DATA  ERROR') 

end 

%  PROJECT/TYPE  testClass  data 

%  'diff'  matrices  store  distances  from  class  type  identifiers  (r's)  to  data  projections 
%  (o's)  determine  best  fit  (i.e.  trial  data  typing)  by  deteriming  minimum  value  of  each 
%  row 

%  2nd  dimension  of  r  gives  number  of  classes,  testData  gives  number  of  test  data  points 
%  taking  column  number  of  each  testProj  point  and  performing  ceil  (colNum/ testData)  gives 
%  class  number 

testProj  =  [ ] ? 

typel  -  zeros {num_class+l,num_class) ; 
if  (gloUsrReq  ==  'N') 

userReq  =  input ( 'Display  typing  distance  data  (Y/N):  ','s'); 

else 

userReq  =  'N'; 
end 

for  m  =  1 : totTestData 
for  mm  =  1 :num_prwise 

o(mm,m)  =  20*logsig  (wNN( :  ,mm)  '  *testCl  (  :  ,m)  +bNN(mm)  )  -10; 

end 

testProj  =  [testProj  o(:,m)]; 
diff  =  []; 
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for  mm  =  l:num_class 

dist  =  o(:,m)  -  r ( : ,mm)  ; 
diff  =  [diff  dist ' *dist] ; 

end 

[y  index]  =  min(diff , [] ,2) ; 
classNumber  =  ceil  (m/testData) ; 

typel (classNumber, index)  -  typel (classNumber , index)  +  1; 


' , num2 s  t r ( index } , 1 


7 /num2str (y) ] ) 


' ,num2str (testData) ] ) 


if  (userReq  ==  'Y') 

disp( [num2str (diff) , 7 
if  (mod (m,  testData)  ==0) 
disp ('****') 

end 

end 

end 

disp ( 7  ' ) 

%  DISPLAY  test  data  class  typing 
for  m  =  1 :num_class+l 

disp  ( [ 7  type/  ,num2str  (m) ,  ' :  7  ,num2str  (typel  (m,  : ) )  #  - 

end 
disp ( 7  7) 

load  type.dat 

type  =  type  +  typel; 

save  type.dat  type  -ascii  -tabs 

%**************************************************************************************** 
%  PLOT  class  type  identifier  and  test  data  projections 
%  NOTE:  1.  can  only  plot  first  three  features 
%  2.  testProj  also  includes  projection  of  non- 

%  class  data 

if  (gloUsrReq  ==  7N7) 

userReq  =  input ('Plot  projections  (Y/N) :  ','s7)? 

if  (userReq  ==  'Y7 ) 
fig  =  fig+1; 

plotProjection (testProj (1:3, : ) , r (1:3 , : ) , testData, method, fig) 
end 

di sp ( 7  7 ) 

end 

%  DISPLAY  class  type  identifier  and  test  data  projections 
%  NOTE:  testProj  also  includes  projection  of  non-class  data 

if  (gloUsrReq  ==  'N' ) 

userReq  =  input ( 'Display  projection  data  (Y/N):  ','s7); 

if  (userReq  ==  7Y7 ) 

dispProjection (testProj ,r, testData, method) 

end 

disp('  ') 

end 

return 


_ b.  logsig.m _ 

function  a  =  logsig(n,b) 

%  where  to  put:  c:\matlab\work\test 
%LOGSIG  Log  sigmoid  transfer  function. 

% 

%  LOGSIG(N) 

%  N  -  SxQ  Matrix  of  net  input  (column)  vectors. 

%  Returns  the  values  of  N  squashed  between  0  and  1. 

% 

%  EXAMPLE:  n  =  -10:0.1:10; 

%  a  =  logsig(n); 
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plot (n, a) 


% 

% 

%  -  LOGSIG (Z,  B)  ...Used  when  Batching. 

%  Z  -  SxQ  Matrix  of  weighted  input  (column)  vectors. 

%  B  -  Sxl  Bias  (column)  vector. 

%  Returns  the  squashed  net  input  values  found  by  adding 

%  B  to  each  column  of  Z. 

% 

%  LOGSIG ( 'delta' )  returns  name  of  delta  function. 

%  LOGSIG (' init' )  returns  name  of  initialization  function. 

%  LOGSIG ( 'name' )  returns  full  name  of  this  transfer  function. 

%  LOGSIG ( 'output' )  returns  output  range  of  this  function. 

% 

%  See  also  NNTRANS,  BACKPROP,  NWTAN,  LOGSIG. 

%  Mark  Beale,  1-31-92 
%  Revised  12-15-93,  MB 

%  Copyright  (c)  1992-94  by  The  MathWorks,  Inc. 

%  $Revision:  1.1  $  $Date:  1994/01/11  16:25:39  $ 

if  nargin  <  1,  error ('Note  enough  arguments.')?  end 

if  isstr(n) 

if  strcmp ( lower (n) , 'delta' ) 
a  =  'deltalog'; 

elseif  strcmp (lower (n) , 'init' ) 
a  =  'nwlog'; 

elseif  strcrrp  (lower  (n)  , 'name' ) 
a  =  'Log  Sigmoid'; 
elseif  strcmp (lower (n) , 'output') 
a  =  [01];. 
else 

error ( 'Unrecognized  property. ' ) 
end 
else 

if  nargin==2 

[nr,nc]  =  size(n); 
n  =  n  +  b*ones (1 ,nc) ; 
end 

a  =  1  . /  (1+exp (-n)  )  ; 
end 


c.  sigderiv.m 

function  d=sigderiv(n) 


%★**********************************★*★********************* 
%  This  function  calculated  the  derivative  of  logsig  function 
%  where  to  put:  c:\matlab\work\test 
%************★****★***************************************** 

d=exp(-n) ./( (l+exp(-n)) .A2) ; 
i  =  f indefinite  (d)  )  ; 
d(i)  =  0; 


4.  Standard  Mean  Separator 
a.  trmsjsp.m 

function  [wl,bl]  =  trms_sp  (wl ,bl ,p,  tp, method,  fig)  ~  ~  ” 

%***********★**************★**★*★*★******★***+*******★***★************************* 
%  Function 

%  TRAINS  the  mean  separator  neural  network  with  performance  parameter  defined  as 
%  MD  =  - [E{20*logsig(w' *x+b) -10}  -  E{20*logsig (w' *y+b) -10} ] ^2 

%  to  determine  weight  and  bias  for  optimal  projection 
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% 

%  Use:  [wl,bl]  =  trms_sp (wl,bl,p,  tp,  fig) 

% 

%  Input  wl:  initial  weight  vector  (3x1) 

%  bl:  initial  bias  (lxl) 

%  P=  matrix  of  training  data  for  two  classes 

%  tp:  training  parameters  (see  below) 

%  method:  mean  separator  variation  number 

%  1  -  standard 

%  2  -  preconditioned  input  (Mod  1) 

%  5  -  normalized  projection  (Mod  2) 

8  -  with  VMR  termination  (Mod  3) 

%  fig:  figure  number 

% 

%  Returns  wl :  optimized  weight  vector 

%  bl:  optimized  bias 

% 

%  26  February  2000 
%  Miguel  G.  San  Pedro 

%****************************************************** ********************************** 
%  MEAN  SEPARATOR  training  function 
%  GENERAL  EQUATION 

%  MD{w,b)  =  - [mean (20*logsig{w* *x+b} -10)  -  mean (20*logsig{wr *y+b} -10) ] ^2 
%  =  - [20*mean (logsig{w' *x+b} ) -10  -  20*mean (logsig{w' *y+b} )  +  10] *2 

%  =  -400 [meant logsig{w* *x+b) ) -  mean (logsig{w' *y+b} ) ] A2 

%  =  -400 [mean (logsig{w‘ *x+b)  -  logsig{w' *y+b} ) ] ^2 

% 

%  DETERMINE  gradient  by 
%  dMD/dw  =  c*dl 

%  with  c  =  -800 [mean (logsig{w’ *x+b}  -  logsig{w' *y+b} )  ] 

%  dl  =  mean(der_logsig{w' *x+b} *x-der_logsig{w’ *y+b} *y# 2) 

% 

%  dMD/db  =  c*d2 

%  with  d2  =  mean (der_logsig{w'  *x+b>  -der_logsig{w'  *y+b} } 

% 

%  Training  parameters (tp) 

%  tp(l) :  epochs  between  updating  display 

%  tp  ( 2 )  :  maximum  number  of  epochs  to  train 

%  tp ( 3 )  :  initial  leming  rate 

.%  tp ( 4 ) :  learning  rate  increase 

%  tp(5) :  learning  rate  decrease 

%  tp(6) :  momentum  constant 

%  tp  ( 7 )  :  maximum  error  ratio 

% 

%**************************************************************************************** 

global  gloUsrReq 
global  gloUsrPlot 

%  TRAINING  PARAMETERS 

df  =  tp { 1 ) ; 

me  =  tp { 2 ) ; 

lr  =  tp { 3 ) ; 

im  =  tp (4 )  ; 

dm  =  tp  ( 5 )  ; 

me  =  tp ( 6 )  ; 

er  =  tp (7 )  ; 

dwl  =  0; 
dbl  =  0; 

MC  =  0; 

[pRow.pCol]  =  size(p); 

nx  =  zeros (pRow/2 , pCol ) ? 
ny  =  nx; 

nx(l :pRow/2, : )  =  p (1 :pRow/2, : ) ; 
ny ( 1 : pRow/ 2 , : )  =  p (l+pRow/2 :pRow, : ) ; 
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logsig_x  =  logsig (wl ' *nx+bl) ; 
logsig_y  =  logsig (wl ' *ny+bl) ; 

a  =  -400*  (mean  (logsig__x  -  logsig_y,  2) )  *2  ; 

%  CHECK  how  weights  and  bias  are  changing 
% 1 oad  . . \ checkWB . dat 

%  TRAINING 

if  {gloUsrReq  ==  'N' ) 

userReq  =  input (' Display  PROJ_INDEX  update  message  (Y/N) :  ',*s')7 

else 

userReq  =  'N' ; 
end 

if  (userReq  ==  'Y') 

message  =  sprint f ( 'TRAINMSNN:  %%g/%g  epochs,  PROJ_INDEX  =  %%g.\n',me); 

fprintf (message, 0, a) 

disp ( [ '  lr  =  ' ,num2str (lr) ] ) 

end 

ctr_repeat  =  0; 
go_on  =  1 ; 
ii  =  1; 
a_save  =  0? 
plot_a_save  =  0; 
plot_lr_save  =  0; 
wl_save  =  rand (pRow/2 , 1 ) ; 
bl_save  =  rand ( 1 ) ; 
while (go_on==l) 

%  LEARNING  PHASE 

[dwl,dbl]  =  lrms_sp  (wl,bl ,p,dwl,dbl,  lr,MC)  ; 

%  stepsize  (alpha  in  steepest  descent  algorithm)  incorporated  as  last  step  in  lrms_sp 
new_wl  =  wl-dwl; 
new_bl  =  bl-dbl; 

new_a  =  -400*  (mean  (logsig  (new_wl '  *nx+new_bl )  -  logsig  (new_wl '  *ny+new_bl )  ,  2 ) )  ? 

MC  =  me; 

%  PRESENTATION  PHASE 
if  (new_a  >  a/er) 
lr  =  lr*dm; 

MC  =  0? 
else 

if  (new_a  <  a) 
lr  =  lr*  int¬ 
end 

wl  =  new_wl ; 
bl  =  new_bl ; 
a  =  new_a; 

end 

%  checkWB  = [checkWB;  [a  wl '  bl]]; 

%  TRAINING  RECORD 
%  PLOTTING 
plot_a(ii)  =  a; 
plot_lr(ii)  =  lr; 

%  DISPLAY  performance  parameter 
if  (userReq  ==  'Y') 

if  (rem(ii,df)  ==  0) 

fprintf (message, ii, a) 
disp ( [ ' lr  =  ' ,num2str (lr) ] ) 

end 

end 

%  if  lr  falls  below  minimum  allowable  (no  learning  being  accomplished)  ,  break  out  of  loop 
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%  if  final  MD  >  -360,  reset  loop  counter,  choose  new  initial  weights  and  bias  and  repeat 
%  loop 

if  ( (lr  <  le-4) |  < ii  ==  me) ) 
if  (abs(a_save)  <  abs(a)} 
a_save  =  a; 
wl_save  =  wl; 
bl_save  =  bl; 
plot_a_save  =  plot_a; 
plot_lr_save  =  plot_lr; 
end 

if  ( (a_save  >  -360)  &  (ctr__repeat  <=  10)) 
ii  =  0; 
plot_a  -  [] ; 
plot_lr  =  [); 
wl  =  randn (pRow/2 , 1) ; 
bl  =  randn (1, 1) ; 

a  =  -400* (mean (logsig (wl 7 *nx+bl )  -  logsig(wl7 *ny+bl) , 2) ) A2; 

dwl  =  0; 
dbl  =  0 ; 

MC  =  0; 
lr  =  tp (3 ) ; 

ctr_repeat  =  ctr_repeat+l; 

%  checkWB  =  [checkWB;  0001  zeros ( size (wl ') )  NaNj ; 

if  (userReq  ==  'Y' ) 

disp { 7  ***  INSUFFICIENT  PROJECTION  INDEX  ***') 
disp { '  ') 

end 

else 

go_on  =  0; 

end 

end 

ii  =  ii+1? 

end 

disp(['num  epochs  = ' ,num2str (ii-1 ) ] ) 
disp(['lr  =  7 ,num2str (lr) ] ) 
disp(['MD  =  7 ,num2str (a_save) 3 ) 

wl  =  wl_save; 
bl  =  bl_save; 
disp ( 7  ') 

if  (gloUsrPlot  ==  7 Y7 ) 
figure (fig) 
orient  tall 
subplot (211) 
plot (plot_a_save) 
xlabel { 7  time 7 ) 
ylabel ( 'MD" ) 

title([#MDvs  time  (Method' ,num2str (method) ,') ' ] ) 
grid  on 

subplot (212) 
plot (plot_lr_save) 
xlabel ( ' time7 ) 
ylabel ( 7 lr 7 ) 

title ( [ 'learning  rate  vs  time  (Method7 ,num2str (method) , 7 ) 7  3 ) 
grid  on 

end 

%checkWB  =  [checkWB;  0001  ones  (size  (wl 7  )  )  NaN]  ; 

%save  .. \checkWB.dat  checkWB  -ascii  -tabs 

return 
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b. 


Irmsjsp.m 


function  [dw,db]  =  lrms__sp(w/b/p,dwl,dbl/lr,mc) 


%**************************************************************************************** 
%  Function 

%  Learning  rate  function  for  the  mean  separator  neural  network  with  performance 
%  parameter  defined  as 

%  md  =  - [E{20*logsig(w' *x+b) -10}  -  E{20*logsig (w# *y+b) -10} ] *2 

%  to  determine  change  in  weight  and  bias  for  optimal  projection 
% 

%  Use:  [dw,db]  =  lrms_sp(w,b,p,dwl,dbl,  lr,mc) 

% 


%  Input  w: 

%  b: 

%  p: 

%  dwl : 

%  dbl: 

%  lr: 

%  me : 

% 

%  Returns  dw: 

%  db: 


weight  vector  (3x1) 
bias  (1x1) 

matrix  of  training  data  for  two  classes 

current  change  in  weight 

current  change  in  bias 

learning  rate 

momentum  constant 

weight  vector  change  (3x1) 
bias  change  ( lxl ) 


%  16  January  2000 

%  Miguel  G.  San  Pedro  ****************** 


[pRow,pCol]  =  size(p); 
nx  =  zeros (pRow/2  ,pCol) ; 
ny  =  nx; 

nx(l :pRow/2, : )  =  p  (1  :pRow/2  ,  : )  ; 
ny  ( 1 : pRow/ 2,  : )  =  p  (pRow/2+1  :pRow,  :)  ; 

logsig_x  =  logsig (w' *nx+b) ; 
logsig„y  =  logsig (w' *ny+b) ; 
der_logsig_x  =  sigderiv(w' *nx+b)  ; 
der_logsig_y  =  sigderiv(w'  *ny+b)  ; 

dll  =  []; 

dll  -  der_logsig_x ( [ones (1, pRow/2 )],:) ; 
dl2  =  []; 

dl2  =  der_logsig_y( [ones (1, pRow/2) ]  ,  :) ; 
dl  =  mean ( dll. *nx  -  dl2.*ny,2); 

c  =  -800* (mean (logs ig_x, 2 )  -  mean (logs ig_y, 2) ) ? 
dw  =  c*dl ; 

db  =  c*mean (der_logsig_x  -  der_logsig_y, 2) ; 

%  APPLY  adaptive  lr  and  stepsize 
dw  =  mc*dwl  +  (1-mc)  *lr*dw; 
db  =  me* dbl  +  (1-mc)  *lr*db; 

retum 


c.  meansep_spl 


function  a  =  meansep_spl  (pi  ,p2  ,  w,b) 


%  Function 

%  CALCULATES  the  mean  separator  neural  network  with  performance  parameter  defined  as 
%  MD(w,b)  =  -  [mean(20*logsig{w'*x+b}-10)  -  mean{20*logsig (w' *y+b) -10} ] ~2 

% 

%  Use:  a  =  meansep_spl  (pl,p2 ,  w,b) 

% 
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%  Input  pi :  row  feature  vector  for  first  class 

%  P2:  row  feature  vector  for  second  class 

%  w:  weight  vector 

%  b:  bias 

% 

%  Returns  a:  mean  separator  performance  parameter  value 

%  5  January  2000 
%  Miguel  G.  San  Pedro 

%********* ********************************* **************************************  ^^^ 

if  nargin  <  3,  error ('Not  enough  arguments.');  end 

alpha  =  logsig (w' *pl  +  b) ? 

beta  =  logsig(w'*p2  +  b)  ; 

a  =  -400* (mean (alpha  -  beta,2))A2; 

return 


Preconditioned  Input  Data  (MSNN  Mod  1):  simmsnnjC.m 


function  simmsnnj: (classData_nom,num_features,w,b, tp, fig) 

%**************************************^^^^^^^^^^^^^^^^^^ 
%  Function 

%  SIMULATES  the  mean  separator  neural  network  with  performance  parameter  defined  as 
%  MD  =  - [E{20*logsig(w' * [ (x-mean(x) ) /sd(x) +mean(x) ] +b) -10} 

%  -  E{20*logsig(w' * [ (y-mean(y) ) /sd(y)  +mean(y) ] +b) -10} ] A2 

%  Use:  simmsnn_C{classData_norm,num_features,w,b/  tp,  fig) 

%  Calls  trms_sp  and  lrms__sp  since  equations  are  same;  only  input  vectors  differ 
% 

%  Input  classData_norm:  normalized  training  data 

projection  weight  vector 
projection  bias 

training  parameters  (see  function  trms_sp2) 
figure  number 


w: 
b: 
tp: 

fig: 


%  Returns  (none) 

% 

%  23  February  2000 
%  Miguel  G.  San  Pedro 

%*******************************************  ****  *******  ****  *********  ********************* 
global  gloUsrReq 

method  =2; 

[classRow, num_data]  =  size (classData_norm) ; 
num_class  =  classRow/num_features; 

numjprwise  =  sum(l :num_class-l) ;  %  number  of  pairwise  comparisons 

ind  -  0;  %  pairwise  index 

r  =  zeros  (num_prwise/num_class) ;  %  class  type  identifier 

%******** ***********  ********************************************************************* 
%  COMPARE  class  k  and  class  kk 
for  k  =  1 :num_class 

knum_feat  =  k*num_features ; 
for  kk  =  k+1 :num_class 

kknum_f  eat  =  kk*num_  features  ; 
ind  =  ind  +  1; 

classl  =  classData_norm (knum_feat-num_features+l :  knum_f eat,  : )  ; 

class2  =  classData_nom(kknum_feat-num_features+l  :kknum_feat,  : ) ; 

pi  =  [classl ;class2] ; 
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disp(  ['Class  '  ,num2str (k) , '  vs  Class  '  ,num2str (kk)  ) ) 
fig  =  fig+1; 

[wNN ( :  /  ind)  ,  bNN ( ind)  ]  =  trms_sp  (w,b, pi  ,  tp, method,  fig)  ; 

%  DETERMINE  class  type  identifier  for  this  pairwise  comparison 
for  mm  -  l:num_class 

mmnum_feat  =  mm*num__f  eatures  7 

classA  -  classData_nom(mmnum_feat-num_features+l :mmnum_feat,  : )  ? 
r(ind,mm)  =  20*mean (logsig {wNN( :  /  ind)  '  *classA  +  bNN(ind) ) )  -10; 

%  DETERMINE  projection  data  for  neuron  maps 
plotr  =  [plotr  20*logsig (wNN( : , ind) ' *classA  +  bNN(ind) ) -10] ; 

end 

%  PLOT  neuron  maps 
figure 
plot (plotr) 
xlabel('Test  Point') 

ylabel  ( [  'Neuron  Map  [ #  ,num2str  (k) ,  ' ,  '  ,num2str  (kk)  / '  3  '  3 ) 
end 

end 

%  DISPLAY  weights/bias  and  class  type  identifiers 
if  (gloUsrReq  ==  'N' ) 

userReq  =  input ( 'Display  projection  weights  and  biases  (Y/N) : 
if  (userReq  ==  'Y' ) 

dispWeightBias  (wNN,  bNN) 

end 

disp ( '  ') 

end 

%*****★★*******************★*******★***************************************************** 

%  CLASSIFY  test  points 

load  ..\test\testClass_norcn.dat 

[testRow, testData]  =  size ( testClass_norm) ? 

numTestData  =  testData/ (num_class+l) 7 
if  (testRow  ~=  niim_features*num_class) 
disp('***  DATA  ERROR') 

end 

%  PROJECT/TYPE  testClass  data 

%  'diff'  matrices  store  distances  from  class  type  identifiers  (r's)  to  data  projections 
%  (o's)  determine  best  fit  (i.e.  trial  data  typing)  by  deteriming  minimum  value  of  each 
%  row 

%  2nd  dimension  of  r  gives  number  of  classes,  testData  gives  number  of  test  data  points 
%  taking  column  number  of  each  testProj  point  and  performing  ceil  (colNum/ testData)  gives 
%  class  number 

typel  =  zeros (num_class+l , num_class)  ; 
if  (gloUsrReq  ==  'N' ) 

userReq  =  input { 'Display  typing  distance  data  (Y/N):  ','s'); 

else 

userReq  =  'N'; 
end 

diff Mat  =  [ ] ; 
for  k  =  l:num— class 

knum_feat  =  k*num_features; 

xk  =  [knum__feat  -  num_f  eatures  +  1  :knum_feat]  ? 

diff Row  =  [ ] 7 

for  kk  =  1: testData 

for  mm  =  1 : num_prwise 

o{mn\fyik)  =  20*logsig(wNN( :  ,mm)  '  *testClass_norm(xk,  kk) ) -10  7 
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end 

dist  =  o(:,kk)  -  r(:,k); 
diffRow  =  [diffRow  dist'*dist]; 

end 

diffMat  =  [diffMat;dif fRow] ; 

end 

[y  index]  =  min (diffMat ,  [], 1 ) ; 

for  k  =  1 :num_class+l 

for  kk  =  1 inumTestData 

xx  =  (k-1) *numTestData+kk; 

typel  (k,  index  (xx) )  =  typel  (k,  index  (xx) )  +1 

end 

end 

disp('  ') 

%  DISPLAY  test  data  class  typing 
for  m  =  1 :num_class+l 

disp  { [ 7  type ' ,  num2str  (m) ,  '  :  '  ,num2str  (typel  (m,  :) )  ] ) 

end 

load  type.dat 

type  =  type  +  typel; 

save  type.dat  type  -ascii  -tabs 

^*  ****************************  -k  *  *  *  *  -k  *  -k  *********  -k  -k  -k  *  *  *  -k  *******  -k  *  *  *  *  -k  *  -k  *****  -k  -k  -k  *  *  -k  -k  *******  -t 

%  PLOT  class  type  identifier  and  test  data  projections  -  option  not  permitted 

%**********************************************************************+**************** 

%  PLOT  class  type  identifier  and  test  data  projections  -  option  not  permitted 

return 


6.  Normalized  Projection  Space  (MSNN  Mod  2) 


a.  trms_sp5.m 

function  [wl,bl]  =  trms_sp5 <wl ,bl,p, tp, method, fig) 


%  Function 


%  TRAINS  the  mean  separator  neural  network  with  performance  parameter  defined  as 
%  MD  =  -  [E {alpha  -  beta}  ]  *2*  [E{  (-alpha  -  E{alpha} )  ^2} 

%  +  E{  (beta  -  E{beta})/S2)  +  delta]  ^-1 

%  with  alpha  =  logsig  {w1  *x+b)  ,  beta  =  logsig  (w1  *y+b)  ,  and  delta  precludes  division  by 
%  zero,  to  determine  weight  and  bias  for  optimal  projection 

%  NORMALIZES  basic  performance  parameter  (standard  MSNN)  by  sum  of  projection 

%  variances 
% 


%  Use:  [wl,bl]  =  trms_sp5  (wl , bl , p ,  tp , method,  fig) 
% 


%  Input 
% 

% 

% 

% 

% 

% 

% 

% 

% 

% 


wl : 
bl: 
p: 
tp: 

method: 


fig: 


initial  weight  vector  (3x1) 
initial  bias  (1x1) 

matrix  of  training  data  for  two  classes 
training  parameters  (see  below) 
mean  separator  variation  number 

1  -  standard 

2  -  preconditioned  input  (Mod  1) 

5  -  normalized  projection  (Mod  2) 

8  -  with  VMR  termination  (Mod  3) 
figure  number 


%  Returns  wl:  optimized  weight  vector 

%  bl :  optimized  bias 

% 

%  26  February  2000 
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Miguel  G.  San  Pedro 

★★************************************************************************************** 
MEAN  SEPARATOR  training  function 
GENERAL  EQUATION 

MD(w,b)  =  - [E{20*logsig (w' *x+b) -10}  -  E{20*logsig (w' *y+b) -10} 3 A2 

* [var (20*logsig (w'x+b) -10)  +  .  var  (20*logsig(w'y+b) -10)  +  delta] A-1 
=  - [E{20*logsig (w' *x+b) -10}  -  E{20*logsig (w' *y+b) -10} ] A2 
* [E{ (20*logsig(w' *x+b) -10  -  E{20*logsig (w' *x+b) -10} ) A2 

+  E{ (20*logsig (w' *y+b) -10  -  E{20*logsig (w# *y+b) -10} ) *2  +  delta] A-1 
=  -[20*E{logsig(w'*x+b) }-10  -  20*E{logsig (w' *y+b) +10} 3 "2 
* [E { (20*logsig(w' *x+b) -10  -  20*E{logsig(w' *x+b) }+10) A2 

+  E{ (20*logsig(w'*y+b)-10  -  20*E{logsig(w' *y+b) }+10) A2  +  delta] A-1 
=  - [E{logsig(w' *x+b) }  -  E{logsig(w' *y+b) } ] A2 
* [E{ (logsig(w' *x+b)  -  E{logsig(w' *x+b) } ) A2} 

+  E{ (logsig(w' *y+b)  -  E{logsig (w' *y+b) } ) A2}  +  delta] A-1 
let  alpha  =  logsig (w' *x+b) ,  beta  =  logsig (w' *y+b) 

=  - [E{ alpha}  -  E{beta}]A2*[E{ (alpha  -  E{alpha})A2}  +  [E{ (beta  -  E{beta})A2} 

+  delta] A1 

=  - [E {alpha  -  beta}] A2* [E{alphaA2  +  betaA2} 

-  EA2 {alpha}  -  EA2{beta}  +  delta] A-1 
or,  alpha  =  -[E{alpha  -  beta} ] A2/ [var (alpha)  +  var (beta)  +  delta] 
note:  if  den  is  infinitesimally  small,  delta  =  le-10 


DETERMINE  gradient  by 

K  =  E { alpha  -  beta} / (E{alphaA2  +  betaA2}  -  EA2{alpha}  -  EA2{beta}  +  delta) 

dMD/dw  =  2K[K* (E{alpha*dalpha/dw  +  beta*dbeta/dw} 

-  E {alpha} E{dalpha/dw}  -  E{beta}E{dbeta/dw} ) 

-  E{dalpha/dw  -  dbeta/dw}] 
dMD/db  =  2K[K*  (E{alpha*dalpha/db  +  beta*dbeta/db} 

-  E{alpha}E{dalpha/db}  -  E{beta}E{dbeta/db} ) 

-  E{dalpha/db  -  dbeta/db} ] 


Training  parameters (tp) 


epochs  between  updating  display 
maximum  number  of  epochs  to  train 
initial  lerning  rate 
learning  rate  increase 
learning  rate  decrease 
momentum  constant 
maximum  error  ratio 


global  gloUsrReq 
global  gloUsrPlot 

format  short  e 
delta  =  le-10; 


%  TRAINING  PARAMETERS 

df  =  tp ( 1 ) ; 

me  =  tp ( 2 ) ; 

lr  =  tp ( 3 ) ; 

im  =  tp ( 4 ) ; 

dm  =  tp  ( 5 )  ; 

me  =  tp  ( 6 )  ; 

er  =  tp (7) ; 


dwl  =  0; 
dbl  =  0; 

MC  =  0; 

[pRow,pCol]  =  size (p) ; 

nx  =  zeros (pRow/2,pCol) ; 
ny  =  nx; 

nx ( 1 : pRow/ 2 , : )  =  p ( 1 : pRow/ 2 , : ) ; 
ny (1 :pRow/2, : )  =  p (l+pRow/2 :pRow,  : ) 
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alpha  =  logsig  (wl '  *nx+bl) 
beta  =  logsig (wl ' *ny+bl) ; 

E_alpha  =  mean ( alpha, 2 ) ; 

E_beta  =  mean (beta, 2) ; 
var_alpha  =  var ( alpha , 1 ) ; 
var_beta  =  var (beta, 1); 

num  =  (E_alpha  -  E__beta)A2; 
den  =  var_alpha  +  var_beta; 
if  (den  <  le-10) 
den  =  delta; 

end 

a  =  -num/ den ? 

%  CHECK  how  mean  and  variance  are  updating 
%checkMD  =  [ ] ; 

%checkMD  =  [ check® ;  [num  den]  3  ? 

%  CHECK  how  weights  and  bias  are  changing 
%load  . . \checkWB.dat 

%  TRAINING 

if  (gloUsrReq  ==  'N' ) 

userReq  =  input ( 'Display  PROJ_INDEX  update  message  (Y/N) :  ','s 

else 

userReq  =  'N' ; 

end 

if  (userReq  ==  'Y' ) 

message  =  sprint f  ( 'TRAINMSNN:  %%g/%g  epochs,  PROJ_INDEX  =  %%g.\n',me) 
fprintf (message, 0 , a) 
disp([rlr  =  ' ,num2str (lr) ] ) 

end 

ctr_repeat  =  0; 
go_on  =  1  ,- 
ii  =  1; 
a_save  =  0; 
plot_a_save  =  []; 
plot_lr_save  =  [ ] ; 
wl_save  =  rand  ( pRow/  2,1); 
bl_save  =  rand ( 1 ) ; 

GOODcheck  =  0; 

while (go_on==l ) 

%  LEARNING  PHASE 

[dwl, dbl]  =  lrms_sp5  (wl,bl ,p, dwl , dbl,  lr ,MC)  ; 

%  stepsize  (alpha  in  steepest  descent  algorithm)  incorporated  as 
%  last  step  in  lrms_sp5 
new_wl  =  wl-dwl; 
new_bl  =  bl-dbl; 

new_alpha  =  logsig (new_wl ' *nx+new_bl ) ; 
new_beta  =  logsig  (new__wl '  *ny+new_bl)  ? 

E_new_alpha  =  mean (new_alpha , 2 ) ; 

E_new__beta  =  mean  (new_beta,  2 )  ; 
var_new_alpha  =  var  (new_alpha,  1 )  ? 
var_new_beta  =  var (new_beta, 1) ; 

new_num  =  ( E_new_alpha  -  E_new_beta) A2 ; 
new_den  =  var_new_alpha  +  var_new_beta  ? 
if  (new_den  <  le-10) 
new_den  =  delta; 

end 

new__a  =  -new_num/new_den? 
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MC  =  me; 


%  PRESENTATION  PHASE 
if  (new_a  >  a/er) 

Ir  =  lr*dm; 

MC  =  0? 

else  \ 

if  (new_a  <  a) 

Ir  -  lr*im; 

end 

wl  =  new_wl; 
bl  =  new_bl; 
a  =  new__a; 
num  =  new_num; 
den  =  new_den; 

end 

%  checkWB  =  [checkWB;  [a  wl'  bl]]; 

%  checkMD  =  [checkMD;  [num  den] ] ; 

%  TRAINING  RECORD 
%  PLOTTING 
plot_a(ii)  =  a; 
plot_lr(ii)  =  lr; 

%  DISPLAY  performance  parameter 
if  (userReq  ==  'Y') 

if  (rem(ii,df)  ==  0) 

fprintf (message, ii,  a) 
disp  ( [ '  lr  =  '  ,num2str  (lr)  ]  ) 

end 

end 

%  CHECK  improvement  in  performance  parameter 
if  (abs(a_save)  <  abs(a)) 
a_save  =  a; 
wl_save  =  wl; 
bl_save  =  bl; 
plot_a_save  =  plot_a; 
plot_lr_save  =  plot_lr; 

lr  =  lr/0.9;  %  prevents  stalling  training  trajectory 


%  CALCULATE  termination  parameter 

%  Termination  parameter:  considered  with  ratio  of  difference  in  Q(h —  0-005)  pts 

%  and  difference  of  means 

%  Assume  Gaussian  distribution 

%  1.65  gives  5.0%  in  tails 

%  1.95  gives  2.5%  in  tails 

%  2.52  gives  0.5%  in  tails 

GOOD_alpha  =  logsig (wl_save' *nx+bl_save) ; 

GOOD_beta  =  logsig  (wl_save' *ny+bl_save)  ; 


E_GOOD_alpha  =  mean  (GOOD_alpha,  2 )  ; 
E_GOOD_beta  =  mean  { GOOD_.be t a,  2)  ; 
var_G00D__alpha  =  var (GOOD_alpha,  1 )  ; 
var_G00D_beta  =  var (GOOD_beta,  1)  ; 


GOODcheck  = 

end 


1  -  2 . 52* (sqrt (var_GOOD_alpha)  +  sqrt <var_GOOD_beta) ) . . . 
/abs (E_GOOD_alpha  -  E_GOOD_beta) ; 


if  ({lr  <  le-4) | (ii  ==  me) | (GOODcheck  >  0.90)) 
go_on  =  0; 

end 

ii  =  ii+1;  %  INCREMENT  epoch  counter 

end 

disp ([ 'num  epochs  =  ' ,num2str (ii-1)  ]  ) 
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disp([/lr  =  ' ,num2str (lr) ] ) 
disp{[/MD  =  #  ,num2str  (a_save)  ] ) 
disp(['VMR=  '  ,num2str (GOODcheck) ] ) 

wl  =  wl_save; 
bl  =  bl_save; 
disp  ( '  ' ) 

if  (gloUsrPlot  ==  'Y' ) 
figure ( fig) 
orient  tall 
subplot (211) 

plot (plot _ a _ save) 

xlabel ( 'time' ) 
ylabel ( 'MD' ) 

title (['MD  vs  time  (Method' ,num2str  (method) ,')'] ) 
grid  on 

subplot (212) 
plot (plot_lr_save) 
xlabel  ( 'time' ) 
ylabel ( 'lr' ) 

title ( ['learning  rate  vs  time  (Method'  ,num2str (method)  ,')']) 
grid  on 

end 

%checkWB  =  [checkWB;  0005  ones (size (wl' ) )  NaN] ; 

%save  . . \qheckWB.dat  checkWB  -ascii  -tabs 

%save  checkMD.dat  checkMD  -ascii  -tabs 

return 


b.  Irms_sp5.m 


function  [dw,db]  =  lrms_sp5  (w,b,p, dwl, dbl,  lr,mc) 


ifc**************************************************************************************** 
%  Function 

%  Learning  rate  function  for  the  mean  separator  neural  network  with  performance 
%  parameter  defined  as 

%  MD  =  - [E{ alpha  -  beta} ] ~2* [E{ (alpha  -  E {alpha} )  +  E{ (beta  -  E{beta})^2} 

%  +  delta] ''-l 

%  with  alpha  =  logsig (w1 *x+b) ,  beta  =  logsig(w* *y+b) ,  and  delta  precludes  division  by 
%  zero 

%  note:  if  den  is  infinitesimally  small/  delta  =  le-10 
%  Determines  change  in  weight  and  bias  for  optimal  projection 
% 

%  Use:  [dW/db]  =  lrms_sp5  (W/b/P/dwl/dbl/lr/mc) 

% 


%  Input  w: 

%  b: 

%  p: 

%  dwl : 

%  dbl: 

%  lr: 

%  me : 

% 

%  Returns  dw: 

%  db: 

% 


weight  vector  (3x1) 
bias  (lxl) 

matrix  of  training  data  for  two  classes 

current  change  in  weight 

current  change  in  bias 

learning  rate 

momentum  constant 

weight  vector  change  (3x1) 
bias  change  (lxl) 


%  16  January  2000 
%  Miguel  G.  San  Pedro 


delta  =  le-10; 
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[pRow,pCol]  =  size (p) ; 
nx  =  zeros (pRow/2  ,pCol ) ; 
ny  =  nx; 

nx(l  :pRow/2,  :  )  =  p (1 : pRow/2 , : ) ; 
ny(l:pRow/2,  :  )  =  p  (l+pRow/2  :pRow,  : )  ; 

alpha  =  logsig  (w' *nx+b)  ; 

E_alpha  =  mean  ( alpha ,  2 )  ? 
beta  =  logsig (w' *ny+b) ; 

E_beta  -  mean (beta, 2) ; 

dalpha_db  =  sigderiv(w' *nx+b) ? 

E_dalpha_db  =  mean  (dalpha_db,  2 )  ; 
dbeta_db  =  sigderiv(w' *ny+b) ; 

E_dbeta_db  =  mean  (dbeta_db,  2 ) ; 

dx  =  [] ; 

dx  =  dalpha_db ( [ ones ( 1 , pRow/ 2 ) ] , : ) ? 
dy  =  [ ] 7 

dy  =  dbeta_db ( [ones (1, pRow/2) ],:) ; 

dalpha_dw  =  dx . *nx  7 

E_dalpha_dw  =  mean (dalpha_dw, 2 ) ;  ‘ 

dbeta_dw  =  dy.*ny; 

E_dbeta_dw  =  mean  (dbeta_dw,  2 )  7 

alpha_mat  =  [ ] 7 

alpha_mat  =  alpha ( [ ones ( 1 , pRow/2 ) ] , : ) 7 
beta_mat  =  [ ] ? 

beta_mat  =  beta ( [ones (1 , pRow/2 ) ] , : ) 7 
den  =  var(alpha/l)  +  var(beta,l); 
if  (den  <  le-10) 
den  =  delta; 

end 

K  =  mean (alpha-beta, 2) /den? 

dw  =  2*K*  (K*  (mean  (alpha_mat .  *dalpha__dw+beta_mat .  *dbeta_dw,  2 )  .  . . 

-  E_alpha*E_dalpha_dw  -  E_beta*E_dbeta_dw)  -  E_dalpha_dw  +  E_dbeta_dw)  7 
db  =  2*K*  (K*  (mean (alpha. *dalpha_db+beta.  *dbeta__db,  2)  .  . . 

-  E_alpha*E_dalpha_db  -  E_be  t  a  *  E_dbe  t  a_db )  -  E_dalpha_db  +  E_dbeta_db)  ; 

%  APPLY  adaptive  lr  and  steps ize 
dw  =  mc*dw  +  (1-mc) *lr*dw? 
db  =  mc*db  +  (1-mc)  *lr*db; 

return 


c.  meansep_sp5.m 

function  a  «  meansep_sp5  (pi ,p2,  w,b) 

%**************************************************************************************** 
%  Function 

%  CALCULATES  the  mean  separator  neural  network  with  performance  parameter  defined  as 
%  MD  =  - [E {alpha  -  beta}] ^2* [E{ (alpha  -  E{alpha})"2} 

%  +  EUbeta  -  E{beta})/S2}  +  delta] ''-I 

%  with  alpha  =  logsig (w' *x+b) ,  beta  =  logsig (w# *y+b) ,  and  delta  precludes  division  by 
%  zero 

%  note:  if  den  is  infinitesimally  small,  delta  =  le-10 

%  NORMALIZES  basic  performance  parameter  (Methodl)  by  sum  of  projection  variances 
% 

%  Use:  a  =  meansep_sp5  (pl,p2, w,b) 

% 

%  Input  pi:  matrix  of  features  for  first  class 

%  p2 :  matrix  of  features  for  second  class 

%  w:  weight  vector 

%  b:  bias 
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% 

%  Returns  a:  mean  separator  performance  parameter  value 

% 

%  5  January  2000 
%  Miguel  G.  San  Pedro 

^t*************************************************************************************** 

if  nargin  <  3/  error ('Not  enough  arguments.');  end 

delta  -  le-10; 

alpha  =  logsig (w'*pl  +  b)  ; 
beta  =  logsig (w' *p2  +  b) ; 
num  =  (mean(alpha  -  beta,2))A2; 
den  =  var{ alpha)  +  var(beta); 

if  (den  <  le-10) 
den  =  delta; 

end 

a  =  -num/ den; 
return 


7.  Standard  MSNN  with  VMR  Termination  (MSNN  Mod  3) 
a.  trmssp8.m 

function  [wl,bl]  =  trms_sp5  (wl  ,bl  ,p,  tp, method,  fig) 

%**************************************************************************************** 
%  Function 

%  TRAINS  the  mean  separator  neural  network  with  performance  parameter  defined  as 
%  MD  =  - [E{20*logsig (w' *x+b) -10}  -  E{20*logsig (w' *y+b) -10} ] A2 

%  to  determine  weight  and  bias  for  optimal  projection 
% 

%  Use:  [wl,bl]  =  trms_sp8  (wl,bl ,p,  tp, method,  fig) 

% 

%  Input  wl : 

•%  bl: 

%  p: 

%  tp: 

%  method: 

% 

% 

% 

% 

%  fig: 

% 

%  Returns  wl : 

%  bl : 

% 

%  26  February  2000 
%  Miguel  G.  San  Pedro 

%*************+************************************************************************** 
%  MEAN  SEPARATOR  training  function 
%  GENERAL  EQUATION 

%  MD (w,  b)  =  - [mean(20*logsig{w' *x+b}-10)  -  mean {20* logsig (w1 *y+b) -10 } ] A2 
%  =  - [  20  *mean  (logsig  {w*  *x+b} ) -10  -  20*mean(logsig{w' *y+b} )  +  10] A2 

%  =  -400  [mean  (logsig{w' *x+b} )  -  mean  (logsig{w' *y+b} )  ]  A2 

%  =  -400 [mean (logsig{w' *x+b}  -  logsig{w’ *y+b} ) ] A2 

% 

%  DETERMINE  gradient  by 
%  dMD/dw  =  c*dl 

%  with  c  =  -800  [mean  (logsig {w1  ^x+b}  -  logsig{w’ *y+b} )  ] 

%  dl  =  mean(der_logsig{w’  *x+b} *x-der_logsig{w’  *y+b}*y,2) 


initial  weight  vector  (3x1) 
initial  bias  (lxl) 

matrix  of  training  data  for  two  classes 
training  parameters  (see  below) 
mean  separator  variation  number 

1  -  standard 

2  -  preconditioned  input  (Mod  1) 

5  -  normalized  projection  (Mod  2) 

8  -  with  VMR  termination  (Mod  3) 
figure  number 

optimized  weight  vector 
optimized  bias 


| 
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% 

%  dMD/db  =  c*d2 

%  with  d2  =  mean (der„logsig{w7  *x+b}-der_logsig{w'  *y+b} ) 
% 


%  Training  parameters (tp) 


tp(l) 
tp  (2 ) 
tp  (3 ) 
tp  (4 ) 
tp  (5) 
tp  (6) 
tp  ( 7 ) 


epochs  between  updating  display 
maximum  number  of  epochs  to  train 
initial  leming  rate 
learning  rate  increase 
learning  rate  decrease 
momentum  constant 
maximum  error  ratio 


global  gloUsrReq 
global  gloUsrPlot 


format  short  e 
delta  =  le-10; 


%  TRAINING  PARAMETERS 

df  =  tp(l) ; 

me  =  tp { 2 ) ; 

lr  =  tp ( 3 ) ; 

im  =  tp { 4 ) ; 

dm  =  tp ( 5 ) ; 

me  =  tp ( 6 ) ; 

er  =  tp ( 7 ) ; 

dwl  =  0; 
dbl  =  0; 

MC  =  0; 

[pRow,pCol]  =  size(p); 


nx  =  zeros (pRow/2,pCol) ? 
ny  =  nx; 

nx  ( 1 : pRow/ 2/*.)  =  p  (1  :pRow/2 ,  :  )  ; 
ny  ( 1 : pRow/ 2 ,  : )  =  p  (l+pRow/2  :pRow,  : )  ; 

alpha  =  logsig (wl ' *nx+bl ) ; 
beta  =  logsig (wl 7  *ny+bl ) ; 

E_alpha  =  mean  (alpha,  2)  ; 

E_beta  =  mean (beta, 2) ; 

a  =  -(E_alpha  -  E_beta)/'2; 

%  CHECK  how  weights  and  bias  are  changing 
%load  . . \checkWB . dat 

%  TRAINING 

if  (gloUsrReq  ==  7N7) 

userReq  =  input ( 'Display  PR0J_INDEX  update  message  (Y/N) :  ','s'); 

else 

userReq  -  7N7; 
end 

if  (userReq  ==  7Y7) 

message  =  sprint f ( 7 TRAINMSNN :  %%g/%g  epochs,  PROJ_INDEX  =  %%g.\n7,me); 

f print f (message, 0, a) 

disp { [ 7 lr  =  7 ,num2str (lr) ] ) 

end 

ctr_repeat  =  0; 
go_on  =  1; 
ii  =  1; 
a_save  =  0; 
plot_a_save  =  [ 3  ? 
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plot_lr_save  -  [  ]  ; 
wl_save  =  rand  (pRow/  2,1); 
bl_save  =  rand ( 1 )  ; 

GOODcheck  =  0; 

while (go_on==l) 

%  LEARNING  PHASE 

[dwl,dbl]  =  lrms_sp8  (wl , bl , p , dwl , dbl ,  lr , MC )  ,* 

%  stepsize  (alpha  in  steepest  descent  algorithm)  incorporated  as 
%  last  step  in  lrms_sp8 
new_wl  =  wl-dwl; 
new_bl  =  bl-dbl ; 

new„alpha  =  logsig (new_wl ' *nx+new_bl) ; 
new_ beta  =  logsig  (new_wl '  *ny+new_bl)  ; 

E_new__alpha  =  mean (new_alpha , 2 ) ; 

E_new_beta  =  mean(new_beta, 2 ) ; 

new_num  =  (E_new_alpha  -  E_new_beta) ~2; 
new_a  =  -new_num; 

MC  =  me; 

%  PRESENTATION  PHASE 
if  (new_a  >  a/er) 
lr  =  lr*dm; 

MC  =  0; 
else 

if  (new_a  <  a) 
lr  =  lr*  int¬ 
end 

wl  =  new_wl ; 
bl  =  new_bl; 
a  -  new_a ; 

end 

%  checkWB  =[checkWB;  [a  wl'  bl]]; 

%  checkMD  =  [checkMD;  [num  den] ] ,- 

%  TRAINING  RECORD 
%  PLOTTING 
plot_a(ii)  =  a; 
plot_lr(ii)  =  lr; 

%  DISPLAY  performance  parameter 
if  (userReq  ==  'Y') 

if  (rem(ii,df)  ==  0) 

fprintf  (message, ii,  a) 
disp(['lr  =  ' ,num2str (lr) 3 ) 

end 

end 

%  CHECK  improvement  in  performance  parameter 
if  (abs(a_save)  <  abs(a)) 
a_save  =  a; 
wl_save  =  wl; 
bl_save  =  bl; 
plot_a_save  -  plot„a; 
plot_lr_save  =  plot_lr; 

lr  =  lr/0.9;  %  prevents  stalling  training  trajectory 

%  CALCULATE  termination  parameter 

%  Termination  paramter:  considered  with  ratio  of  difference  in  Q(+-0.005)  pts 
%  and  difference  of  means 

%  Assume  Gaussian  distribution 
%  1.65  gives  5.0%  in  tails 
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%  1.95  gives  2.5%  in  tails 
%  2.52  gives  0.5%  in  tails 
GOOD_alpha  =  logsig(wl_save'  *nx+bl_save)  ; 
GOOD_beta  =  logsig (wl_save' *ny+bl_save) ; 

E_GOOD_alpha  =  mean (G00D_alpha, 2 ) ; 
E_GOOD__beta  =  mean (G00D_beta, 2 ) ; 
var_GOOD_alpha  =  var (G00D_alpha, 1) ; 
var_G00D_beta  =  var  (G00D_.be t a,  1 )  ; 


GOODcheck  =  1 


end 


2.52*  (sqrt  (var_GOOD_alpha)  +  sqrt  ( var_G00D_.be ta)  )  .  .  . 
/abs (E_G00D_alpha  -  E_G00D_beta)  ; 


if  (dr  <  le-4)  |  (ii  ==  me)  |  (GOODcheck  >  0.90)) 
go_on  =  0; 

end 

ii  =  ii+1;  %  INCREMENT  epoch  counter 

end 

disp(['num  epochs  =' ,num2str (ii-1) ] ) 
dispd'lr  =  ' ,num2str (lr) 3 ) 
disp(['MD=  '  ,num2str  (a_save)  ] ) 
disp(['VMR  =  ' ,num2str (GOODcheck) ] ) 


wl  =  wl_save ; 
bl  =  bl_save; 
disp  ('  #) 

if  (gloUsrPlot  ==  '  Y' ) 
figure (fig) 
orient  tall 
subplot (211) 
plot (plot_a_save) 
xlabel ( ' time ' ) 
ylabel('MD') 

titled'MDvs  time  (Method' ,num2str (method) , ') ' ] ) 
grid  on 

subplot (212) 
plot (plot_lr_save) 
xlabel ( 'time' ) 
ylabel ( ' lr ' ) 

title  (['  learning  rate  vs  time  (Method'  ,num2str  (method) ) 
grid  on 

end 


%checkWB  =  [checkWB;  0005  ones (size (wl ') )  NaN]  ; 
%save  ..\checkWB.dat  checkWB  -ascii  -tabs 

%save  checkMD.dat  checkMD  -ascii  -tabs 


return 


b.  Irms_sp8.m 

function  [dw,db]  =  lrms_sp8  (w,b,p,  dwl ,  dbl ,  lr,mc) 

%********************************************★**★********************************** 
%  Function 

%  Learning  rate  function  for  the  mean  separator  neural  network  with  performance 
%  parameter  defined  as 

%  MD  =  - [E{20*logsig(w’ *x+b) -10}  -  E{20*logsig (w1 *y+b) -10} 3 

%  to  determine  change  in  weight  and  bias  for  optimal  projection 
% 

%  Use:  [dw,db]  =  lrms_sp8  (w/b/p/dwl/dbl#  lr,mc) 

% 

%  Input  w: 


weight  vector  (3x1) 


%  b: 

%  p: 

!  %  dwl: 

%  dbl: 

%  lr: 

%  me : 

% 

%  Returns  dw: 

%  db: 

% 

%  26  February  2000 
%  Miguel  G.  San  Pedro 

%******************************************************** 

[pRow,pCol]  =  size(p); 
nx  =  zeros (pRow/2,pCol) ; 
ny  -  nx; 

nx ( 1 : pRow/ 2, : )  =  p ( 1 :pRow/2 , : ) ; 
ny ( 1 : pRow/ 2 , : )  =  p  (pRow/2+1  :pRow,  :)  ; 

logsig_x  =  logsig(w' *nx+b)  ; 
logsig_y  =  logsig (w' *ny+b) ; 
der_logsig_x  =  sigderiv(w/ *nx+b) ; 
der_logsig_y  =  sigderiv{w'  *ny+b) 

dll  =  []; 

dll  =  der_logsig_x ( [ones (1 /pRow/2) ] , : ) ; 
dl2  =  []; 

dl2  =  der__logsig_y  {  [ones  (1 , pRow/2)  ]  /  :  )  ; 
dl  =  mean (dll. *nx  -  dl2.*ny/2); 

c  =  -800* (mean ( logsig_x, 2 )  -  mean (logsig^y, 2) ) ; 
dw  =  c*dl? 

db  =  c*mean(der_logsig_x  -  der_logsig_y,  2)  ; 

%  APPLY  adaptive  lr  and  stepsize 
dw  -  mc*dwl  +  (1-mc)  *lr*dw; 
db  =  me* dbl  +  (1-mc) *lr*db; 


bias  (lxl) 

matrix  of  training  data  for  two  classes 

current  change  in  weight 

current  change  in  bias 

learning  rate 

momentum  constant 

weight  vector  change  (3x1) 
bias  change  (lxl) 


return 
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